Spaces:

dixisouls
/

VQA

Sleeping

VQA

File size: 3,497 Bytes

---
title: VQA
emoji: 🚀
colorFrom: gray
colorTo: yellow
sdk: docker
pinned: false
license: mit
short_description: VQA API Endpoint
---

Check out the configuration reference at
https://huggingface.co/docs/hub/spaces-config-reference

# VizWiz Visual Question Answering API

This repository contains a FastAPI backend for a Visual Question Answering (VQA)
system trained on the VizWiz dataset.

## Features

- Upload images and ask questions about them
- Get answers with confidence scores
- Session management for asking multiple questions about the same image
- Health check endpoint for monitoring
- API documentation with Swagger UI

## Project Structure

```
project_root/
├── app/
│   ├── main.py               # Main FastAPI application
│   ├── models/               # Model definitions
│   │   ├── __init__.py
│   │   └── vqa_model.py      # VQA model implementation
│   ├── routers/              # API route definitions
│   │   ├── __init__.py
│   │   └── vqa.py            # VQA-related endpoints
│   ├── services/             # Business logic
│   │   ├── __init__.py
│   │   ├── model_service.py  # Model loading and inference
│   │   └── session_service.py # Session management
│   ├── utils/                # Utility functions
│   │   ├── __init__.py
│   │   └── image_utils.py    # Image processing utilities
│   └── config.py             # Application configuration
├── models/                   # Directory for model files
├── uploads/                  # Directory for uploaded images
├── .env                      # Environment variables
└── requirements.txt          # Project dependencies
```

## Installation

1. Clone the repository:

```bash
git clone https://github.com/dixisouls/vizwiz-vqa-api.git
cd vizwiz-vqa-api
```

2. Create a virtual environment:

```bash
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
```

3. Install dependencies:

```bash
pip install -r requirements.txt
```

4. Create necessary directories:

```bash
mkdir -p models uploads
```

5. Place your trained model in the `models` directory.

6. Update the `.env` file with your configuration.

## Running the Application

```bash
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
```

The API will be available at http://localhost:8000.

API documentation is available at:

- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc

## API Endpoints

### Health Check

```
GET /health
```

Returns the health status of the API.

### Upload Image

```
POST /api/vqa/upload
```

Upload an image and create a new session.

### Ask Question

```
POST /api/vqa/ask
```

Ask a question about an uploaded image.

### Get Session

```
GET /api/vqa/session/{session_id}
```

Get session information including question history.

### Reset Session

```
DELETE /api/vqa/session/{session_id}
```

Reset a session to start fresh.

## Environment Variables

- `DEBUG`: Enable debug mode (default: False)
- `MODEL_PATH`: Path to the trained model (default: ./models/vqa_model_best.pt)
- `TEXT_MODEL`: Name of the text model (default: bert-base-uncased)
- `VISION_MODEL`: Name of the vision model (default:
  google/vit-base-patch16-384)
- `HUGGINGFACE_TOKEN`: Hugging Face API token
- `UPLOAD_DIR`: Directory for uploaded images (default: ./uploads)

## License

[MIT License](LICENSE)