VQA / README.md
dixisouls's picture
Updated readme
146e974
---
title: VQA
emoji: πŸš€
colorFrom: gray
colorTo: yellow
sdk: docker
pinned: false
license: mit
short_description: VQA API Endpoint
---
Check out the configuration reference at
https://huggingface.co/docs/hub/spaces-config-reference
# VizWiz Visual Question Answering API
This repository contains a FastAPI backend for a Visual Question Answering (VQA)
system trained on the VizWiz dataset.
## Features
- Upload images and ask questions about them
- Get answers with confidence scores
- Session management for asking multiple questions about the same image
- Health check endpoint for monitoring
- API documentation with Swagger UI
## Project Structure
```
project_root/
β”œβ”€β”€ app/
β”‚ β”œβ”€β”€ main.py # Main FastAPI application
β”‚ β”œβ”€β”€ models/ # Model definitions
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ └── vqa_model.py # VQA model implementation
β”‚ β”œβ”€β”€ routers/ # API route definitions
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ └── vqa.py # VQA-related endpoints
β”‚ β”œβ”€β”€ services/ # Business logic
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ β”œβ”€β”€ model_service.py # Model loading and inference
β”‚ β”‚ └── session_service.py # Session management
β”‚ β”œβ”€β”€ utils/ # Utility functions
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ └── image_utils.py # Image processing utilities
β”‚ └── config.py # Application configuration
β”œβ”€β”€ models/ # Directory for model files
β”œβ”€β”€ uploads/ # Directory for uploaded images
β”œβ”€β”€ .env # Environment variables
└── requirements.txt # Project dependencies
```
## Installation
1. Clone the repository:
```bash
git clone https://github.com/dixisouls/vizwiz-vqa-api.git
cd vizwiz-vqa-api
```
2. Create a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Create necessary directories:
```bash
mkdir -p models uploads
```
5. Place your trained model in the `models` directory.
6. Update the `.env` file with your configuration.
## Running the Application
```bash
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
```
The API will be available at http://localhost:8000.
API documentation is available at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
## API Endpoints
### Health Check
```
GET /health
```
Returns the health status of the API.
### Upload Image
```
POST /api/vqa/upload
```
Upload an image and create a new session.
### Ask Question
```
POST /api/vqa/ask
```
Ask a question about an uploaded image.
### Get Session
```
GET /api/vqa/session/{session_id}
```
Get session information including question history.
### Reset Session
```
DELETE /api/vqa/session/{session_id}
```
Reset a session to start fresh.
## Environment Variables
- `DEBUG`: Enable debug mode (default: False)
- `MODEL_PATH`: Path to the trained model (default: ./models/vqa_model_best.pt)
- `TEXT_MODEL`: Name of the text model (default: bert-base-uncased)
- `VISION_MODEL`: Name of the vision model (default:
google/vit-base-patch16-384)
- `HUGGINGFACE_TOKEN`: Hugging Face API token
- `UPLOAD_DIR`: Directory for uploaded images (default: ./uploads)
## License
[MIT License](LICENSE)