|
--- |
|
title: VQA |
|
emoji: π |
|
colorFrom: gray |
|
colorTo: yellow |
|
sdk: docker |
|
pinned: false |
|
license: mit |
|
short_description: VQA API Endpoint |
|
--- |
|
|
|
Check out the configuration reference at |
|
https://huggingface.co/docs/hub/spaces-config-reference |
|
|
|
# VizWiz Visual Question Answering API |
|
|
|
This repository contains a FastAPI backend for a Visual Question Answering (VQA) |
|
system trained on the VizWiz dataset. |
|
|
|
## Features |
|
|
|
- Upload images and ask questions about them |
|
- Get answers with confidence scores |
|
- Session management for asking multiple questions about the same image |
|
- Health check endpoint for monitoring |
|
- API documentation with Swagger UI |
|
|
|
## Project Structure |
|
|
|
``` |
|
project_root/ |
|
βββ app/ |
|
β βββ main.py # Main FastAPI application |
|
β βββ models/ # Model definitions |
|
β β βββ __init__.py |
|
β β βββ vqa_model.py # VQA model implementation |
|
β βββ routers/ # API route definitions |
|
β β βββ __init__.py |
|
β β βββ vqa.py # VQA-related endpoints |
|
β βββ services/ # Business logic |
|
β β βββ __init__.py |
|
β β βββ model_service.py # Model loading and inference |
|
β β βββ session_service.py # Session management |
|
β βββ utils/ # Utility functions |
|
β β βββ __init__.py |
|
β β βββ image_utils.py # Image processing utilities |
|
β βββ config.py # Application configuration |
|
βββ models/ # Directory for model files |
|
βββ uploads/ # Directory for uploaded images |
|
βββ .env # Environment variables |
|
βββ requirements.txt # Project dependencies |
|
``` |
|
|
|
## Installation |
|
|
|
1. Clone the repository: |
|
|
|
```bash |
|
git clone https://github.com/dixisouls/vizwiz-vqa-api.git |
|
cd vizwiz-vqa-api |
|
``` |
|
|
|
2. Create a virtual environment: |
|
|
|
```bash |
|
python -m venv venv |
|
source venv/bin/activate # On Windows: venv\Scripts\activate |
|
``` |
|
|
|
3. Install dependencies: |
|
|
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
4. Create necessary directories: |
|
|
|
```bash |
|
mkdir -p models uploads |
|
``` |
|
|
|
5. Place your trained model in the `models` directory. |
|
|
|
6. Update the `.env` file with your configuration. |
|
|
|
## Running the Application |
|
|
|
```bash |
|
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload |
|
``` |
|
|
|
The API will be available at http://localhost:8000. |
|
|
|
API documentation is available at: |
|
|
|
- Swagger UI: http://localhost:8000/docs |
|
- ReDoc: http://localhost:8000/redoc |
|
|
|
## API Endpoints |
|
|
|
### Health Check |
|
|
|
``` |
|
GET /health |
|
``` |
|
|
|
Returns the health status of the API. |
|
|
|
### Upload Image |
|
|
|
``` |
|
POST /api/vqa/upload |
|
``` |
|
|
|
Upload an image and create a new session. |
|
|
|
### Ask Question |
|
|
|
``` |
|
POST /api/vqa/ask |
|
``` |
|
|
|
Ask a question about an uploaded image. |
|
|
|
### Get Session |
|
|
|
``` |
|
GET /api/vqa/session/{session_id} |
|
``` |
|
|
|
Get session information including question history. |
|
|
|
### Reset Session |
|
|
|
``` |
|
DELETE /api/vqa/session/{session_id} |
|
``` |
|
|
|
Reset a session to start fresh. |
|
|
|
## Environment Variables |
|
|
|
- `DEBUG`: Enable debug mode (default: False) |
|
- `MODEL_PATH`: Path to the trained model (default: ./models/vqa_model_best.pt) |
|
- `TEXT_MODEL`: Name of the text model (default: bert-base-uncased) |
|
- `VISION_MODEL`: Name of the vision model (default: |
|
google/vit-base-patch16-384) |
|
- `HUGGINGFACE_TOKEN`: Hugging Face API token |
|
- `UPLOAD_DIR`: Directory for uploaded images (default: ./uploads) |
|
|
|
## License |
|
|
|
[MIT License](LICENSE) |
|
|