metadata

title: VQA
emoji: 🚀
colorFrom: gray
colorTo: yellow
sdk: docker
pinned: false
license: mit
short_description: VQA API Endpoint

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

VizWiz Visual Question Answering API

This repository contains a FastAPI backend for a Visual Question Answering (VQA) system trained on the VizWiz dataset.

Features

Upload images and ask questions about them
Get answers with confidence scores
Session management for asking multiple questions about the same image
Health check endpoint for monitoring
API documentation with Swagger UI

Project Structure

project_root/
├── app/
│   ├── main.py               # Main FastAPI application
│   ├── models/               # Model definitions
│   │   ├── __init__.py
│   │   └── vqa_model.py      # VQA model implementation
│   ├── routers/              # API route definitions
│   │   ├── __init__.py
│   │   └── vqa.py            # VQA-related endpoints
│   ├── services/             # Business logic
│   │   ├── __init__.py
│   │   ├── model_service.py  # Model loading and inference
│   │   └── session_service.py # Session management
│   ├── utils/                # Utility functions
│   │   ├── __init__.py
│   │   └── image_utils.py    # Image processing utilities
│   └── config.py             # Application configuration
├── models/                   # Directory for model files
├── uploads/                  # Directory for uploaded images
├── .env                      # Environment variables
└── requirements.txt          # Project dependencies

Installation

Clone the repository:

git clone https://github.com/dixisouls/vizwiz-vqa-api.git
cd vizwiz-vqa-api

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Create necessary directories:

mkdir -p models uploads

Place your trained model in the models directory.
Update the .env file with your configuration.

Running the Application

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

The API will be available at http://localhost:8000.

API documentation is available at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

API Endpoints

Health Check

GET /health

Returns the health status of the API.

Upload Image

POST /api/vqa/upload

Upload an image and create a new session.

Ask Question

POST /api/vqa/ask

Ask a question about an uploaded image.

Get Session

GET /api/vqa/session/{session_id}

Get session information including question history.

Reset Session

DELETE /api/vqa/session/{session_id}

Reset a session to start fresh.

Environment Variables

DEBUG: Enable debug mode (default: False)
MODEL_PATH: Path to the trained model (default: ./models/vqa_model_best.pt)
TEXT_MODEL: Name of the text model (default: bert-base-uncased)
VISION_MODEL: Name of the vision model (default: google/vit-base-patch16-384)
HUGGINGFACE_TOKEN: Hugging Face API token
UPLOAD_DIR: Directory for uploaded images (default: ./uploads)

License

MIT License