metadata
title: VQA
emoji: π
colorFrom: gray
colorTo: yellow
sdk: docker
pinned: false
license: mit
short_description: VQA API Endpoint
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
VizWiz Visual Question Answering API
This repository contains a FastAPI backend for a Visual Question Answering (VQA) system trained on the VizWiz dataset.
Features
- Upload images and ask questions about them
- Get answers with confidence scores
- Session management for asking multiple questions about the same image
- Health check endpoint for monitoring
- API documentation with Swagger UI
Project Structure
project_root/
βββ app/
β βββ main.py # Main FastAPI application
β βββ models/ # Model definitions
β β βββ __init__.py
β β βββ vqa_model.py # VQA model implementation
β βββ routers/ # API route definitions
β β βββ __init__.py
β β βββ vqa.py # VQA-related endpoints
β βββ services/ # Business logic
β β βββ __init__.py
β β βββ model_service.py # Model loading and inference
β β βββ session_service.py # Session management
β βββ utils/ # Utility functions
β β βββ __init__.py
β β βββ image_utils.py # Image processing utilities
β βββ config.py # Application configuration
βββ models/ # Directory for model files
βββ uploads/ # Directory for uploaded images
βββ .env # Environment variables
βββ requirements.txt # Project dependencies
Installation
- Clone the repository:
git clone https://github.com/dixisouls/vizwiz-vqa-api.git
cd vizwiz-vqa-api
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Create necessary directories:
mkdir -p models uploads
Place your trained model in the
models
directory.Update the
.env
file with your configuration.
Running the Application
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
The API will be available at http://localhost:8000.
API documentation is available at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
API Endpoints
Health Check
GET /health
Returns the health status of the API.
Upload Image
POST /api/vqa/upload
Upload an image and create a new session.
Ask Question
POST /api/vqa/ask
Ask a question about an uploaded image.
Get Session
GET /api/vqa/session/{session_id}
Get session information including question history.
Reset Session
DELETE /api/vqa/session/{session_id}
Reset a session to start fresh.
Environment Variables
DEBUG
: Enable debug mode (default: False)MODEL_PATH
: Path to the trained model (default: ./models/vqa_model_best.pt)TEXT_MODEL
: Name of the text model (default: bert-base-uncased)VISION_MODEL
: Name of the vision model (default: google/vit-base-patch16-384)HUGGINGFACE_TOKEN
: Hugging Face API tokenUPLOAD_DIR
: Directory for uploaded images (default: ./uploads)