dixisouls commited on
Commit
146e974
Β·
1 Parent(s): b3ab259

Updated readme

Browse files
Files changed (1) hide show
  1. README.md +142 -1
README.md CHANGED
@@ -9,4 +9,145 @@ license: mit
9
  short_description: VQA API Endpoint
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  short_description: VQA API Endpoint
10
  ---
11
 
12
+ Check out the configuration reference at
13
+ https://huggingface.co/docs/hub/spaces-config-reference
14
+
15
+ # VizWiz Visual Question Answering API
16
+
17
+ This repository contains a FastAPI backend for a Visual Question Answering (VQA)
18
+ system trained on the VizWiz dataset.
19
+
20
+ ## Features
21
+
22
+ - Upload images and ask questions about them
23
+ - Get answers with confidence scores
24
+ - Session management for asking multiple questions about the same image
25
+ - Health check endpoint for monitoring
26
+ - API documentation with Swagger UI
27
+
28
+ ## Project Structure
29
+
30
+ ```
31
+ project_root/
32
+ β”œβ”€β”€ app/
33
+ β”‚ β”œβ”€β”€ main.py # Main FastAPI application
34
+ β”‚ β”œβ”€β”€ models/ # Model definitions
35
+ β”‚ β”‚ β”œβ”€β”€ __init__.py
36
+ β”‚ β”‚ └── vqa_model.py # VQA model implementation
37
+ β”‚ β”œβ”€β”€ routers/ # API route definitions
38
+ β”‚ β”‚ β”œβ”€β”€ __init__.py
39
+ β”‚ β”‚ └── vqa.py # VQA-related endpoints
40
+ β”‚ β”œβ”€β”€ services/ # Business logic
41
+ β”‚ β”‚ β”œβ”€β”€ __init__.py
42
+ β”‚ β”‚ β”œβ”€β”€ model_service.py # Model loading and inference
43
+ β”‚ β”‚ └── session_service.py # Session management
44
+ β”‚ β”œβ”€β”€ utils/ # Utility functions
45
+ β”‚ β”‚ β”œβ”€β”€ __init__.py
46
+ β”‚ β”‚ └── image_utils.py # Image processing utilities
47
+ β”‚ └── config.py # Application configuration
48
+ β”œβ”€β”€ models/ # Directory for model files
49
+ β”œβ”€β”€ uploads/ # Directory for uploaded images
50
+ β”œβ”€β”€ .env # Environment variables
51
+ └── requirements.txt # Project dependencies
52
+ ```
53
+
54
+ ## Installation
55
+
56
+ 1. Clone the repository:
57
+
58
+ ```bash
59
+ git clone https://github.com/dixisouls/vizwiz-vqa-api.git
60
+ cd vizwiz-vqa-api
61
+ ```
62
+
63
+ 2. Create a virtual environment:
64
+
65
+ ```bash
66
+ python -m venv venv
67
+ source venv/bin/activate # On Windows: venv\Scripts\activate
68
+ ```
69
+
70
+ 3. Install dependencies:
71
+
72
+ ```bash
73
+ pip install -r requirements.txt
74
+ ```
75
+
76
+ 4. Create necessary directories:
77
+
78
+ ```bash
79
+ mkdir -p models uploads
80
+ ```
81
+
82
+ 5. Place your trained model in the `models` directory.
83
+
84
+ 6. Update the `.env` file with your configuration.
85
+
86
+ ## Running the Application
87
+
88
+ ```bash
89
+ uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
90
+ ```
91
+
92
+ The API will be available at http://localhost:8000.
93
+
94
+ API documentation is available at:
95
+
96
+ - Swagger UI: http://localhost:8000/docs
97
+ - ReDoc: http://localhost:8000/redoc
98
+
99
+ ## API Endpoints
100
+
101
+ ### Health Check
102
+
103
+ ```
104
+ GET /health
105
+ ```
106
+
107
+ Returns the health status of the API.
108
+
109
+ ### Upload Image
110
+
111
+ ```
112
+ POST /api/vqa/upload
113
+ ```
114
+
115
+ Upload an image and create a new session.
116
+
117
+ ### Ask Question
118
+
119
+ ```
120
+ POST /api/vqa/ask
121
+ ```
122
+
123
+ Ask a question about an uploaded image.
124
+
125
+ ### Get Session
126
+
127
+ ```
128
+ GET /api/vqa/session/{session_id}
129
+ ```
130
+
131
+ Get session information including question history.
132
+
133
+ ### Reset Session
134
+
135
+ ```
136
+ DELETE /api/vqa/session/{session_id}
137
+ ```
138
+
139
+ Reset a session to start fresh.
140
+
141
+ ## Environment Variables
142
+
143
+ - `DEBUG`: Enable debug mode (default: False)
144
+ - `MODEL_PATH`: Path to the trained model (default: ./models/vqa_model_best.pt)
145
+ - `TEXT_MODEL`: Name of the text model (default: bert-base-uncased)
146
+ - `VISION_MODEL`: Name of the vision model (default:
147
+ google/vit-base-patch16-384)
148
+ - `HUGGINGFACE_TOKEN`: Hugging Face API token
149
+ - `UPLOAD_DIR`: Directory for uploaded images (default: ./uploads)
150
+
151
+ ## License
152
+
153
+ [MIT License](LICENSE)