VQA / README.md
dixisouls's picture
Updated readme
146e974
metadata
title: VQA
emoji: πŸš€
colorFrom: gray
colorTo: yellow
sdk: docker
pinned: false
license: mit
short_description: VQA API Endpoint

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

VizWiz Visual Question Answering API

This repository contains a FastAPI backend for a Visual Question Answering (VQA) system trained on the VizWiz dataset.

Features

  • Upload images and ask questions about them
  • Get answers with confidence scores
  • Session management for asking multiple questions about the same image
  • Health check endpoint for monitoring
  • API documentation with Swagger UI

Project Structure

project_root/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py               # Main FastAPI application
β”‚   β”œβ”€β”€ models/               # Model definitions
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── vqa_model.py      # VQA model implementation
β”‚   β”œβ”€β”€ routers/              # API route definitions
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── vqa.py            # VQA-related endpoints
β”‚   β”œβ”€β”€ services/             # Business logic
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ model_service.py  # Model loading and inference
β”‚   β”‚   └── session_service.py # Session management
β”‚   β”œβ”€β”€ utils/                # Utility functions
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── image_utils.py    # Image processing utilities
β”‚   └── config.py             # Application configuration
β”œβ”€β”€ models/                   # Directory for model files
β”œβ”€β”€ uploads/                  # Directory for uploaded images
β”œβ”€β”€ .env                      # Environment variables
└── requirements.txt          # Project dependencies

Installation

  1. Clone the repository:
git clone https://github.com/dixisouls/vizwiz-vqa-api.git
cd vizwiz-vqa-api
  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Create necessary directories:
mkdir -p models uploads
  1. Place your trained model in the models directory.

  2. Update the .env file with your configuration.

Running the Application

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

The API will be available at http://localhost:8000.

API documentation is available at:

API Endpoints

Health Check

GET /health

Returns the health status of the API.

Upload Image

POST /api/vqa/upload

Upload an image and create a new session.

Ask Question

POST /api/vqa/ask

Ask a question about an uploaded image.

Get Session

GET /api/vqa/session/{session_id}

Get session information including question history.

Reset Session

DELETE /api/vqa/session/{session_id}

Reset a session to start fresh.

Environment Variables

  • DEBUG: Enable debug mode (default: False)
  • MODEL_PATH: Path to the trained model (default: ./models/vqa_model_best.pt)
  • TEXT_MODEL: Name of the text model (default: bert-base-uncased)
  • VISION_MODEL: Name of the vision model (default: google/vit-base-patch16-384)
  • HUGGINGFACE_TOKEN: Hugging Face API token
  • UPLOAD_DIR: Directory for uploaded images (default: ./uploads)

License

MIT License