File size: 3,497 Bytes
b3ab259 146e974 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
---
title: VQA
emoji: π
colorFrom: gray
colorTo: yellow
sdk: docker
pinned: false
license: mit
short_description: VQA API Endpoint
---
Check out the configuration reference at
https://huggingface.co/docs/hub/spaces-config-reference
# VizWiz Visual Question Answering API
This repository contains a FastAPI backend for a Visual Question Answering (VQA)
system trained on the VizWiz dataset.
## Features
- Upload images and ask questions about them
- Get answers with confidence scores
- Session management for asking multiple questions about the same image
- Health check endpoint for monitoring
- API documentation with Swagger UI
## Project Structure
```
project_root/
βββ app/
β βββ main.py # Main FastAPI application
β βββ models/ # Model definitions
β β βββ __init__.py
β β βββ vqa_model.py # VQA model implementation
β βββ routers/ # API route definitions
β β βββ __init__.py
β β βββ vqa.py # VQA-related endpoints
β βββ services/ # Business logic
β β βββ __init__.py
β β βββ model_service.py # Model loading and inference
β β βββ session_service.py # Session management
β βββ utils/ # Utility functions
β β βββ __init__.py
β β βββ image_utils.py # Image processing utilities
β βββ config.py # Application configuration
βββ models/ # Directory for model files
βββ uploads/ # Directory for uploaded images
βββ .env # Environment variables
βββ requirements.txt # Project dependencies
```
## Installation
1. Clone the repository:
```bash
git clone https://github.com/dixisouls/vizwiz-vqa-api.git
cd vizwiz-vqa-api
```
2. Create a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Create necessary directories:
```bash
mkdir -p models uploads
```
5. Place your trained model in the `models` directory.
6. Update the `.env` file with your configuration.
## Running the Application
```bash
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
```
The API will be available at http://localhost:8000.
API documentation is available at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
## API Endpoints
### Health Check
```
GET /health
```
Returns the health status of the API.
### Upload Image
```
POST /api/vqa/upload
```
Upload an image and create a new session.
### Ask Question
```
POST /api/vqa/ask
```
Ask a question about an uploaded image.
### Get Session
```
GET /api/vqa/session/{session_id}
```
Get session information including question history.
### Reset Session
```
DELETE /api/vqa/session/{session_id}
```
Reset a session to start fresh.
## Environment Variables
- `DEBUG`: Enable debug mode (default: False)
- `MODEL_PATH`: Path to the trained model (default: ./models/vqa_model_best.pt)
- `TEXT_MODEL`: Name of the text model (default: bert-base-uncased)
- `VISION_MODEL`: Name of the vision model (default:
google/vit-base-patch16-384)
- `HUGGINGFACE_TOKEN`: Hugging Face API token
- `UPLOAD_DIR`: Directory for uploaded images (default: ./uploads)
## License
[MIT License](LICENSE)
|