Spaces:
Running
Running
Updating description
Browse files
app.py
CHANGED
@@ -10,14 +10,14 @@ warnings.filterwarnings('ignore')
|
|
10 |
|
11 |
# Define problem statement
|
12 |
problem_statement = """
|
13 |
-
###
|
14 |
-
|
15 |
"""
|
16 |
|
17 |
# Define solution overview
|
18 |
solution_overview = """
|
19 |
### Solution Overview
|
20 |
-
The basic model, trained for a limited duration without extensive hyperparameter tuning, primarily focuses on exploring
|
21 |
"""
|
22 |
|
23 |
# Define real-life scenario application
|
|
|
10 |
|
11 |
# Define problem statement
|
12 |
problem_statement = """
|
13 |
+
### Overview
|
14 |
+
This project aims to generate descriptive spoken captions for images, leveraging CNNs and RNNs for feature extraction and sequence generation, respectively. The model is trained on the Flickr8K dataset and extended with an attention mechanism for enhanced accessibility.
|
15 |
"""
|
16 |
|
17 |
# Define solution overview
|
18 |
solution_overview = """
|
19 |
### Solution Overview
|
20 |
+
The basic model, trained for a limited duration without extensive hyperparameter tuning, primarily focuses on exploring the integration of the attention mechanism with the Encoder-Decoder architecture for image processing. To improve inference quality, Vit-GPT2 architecture is integrated. [Visit the Kaggle notebook](https://www.kaggle.com/code/krishna2308/eye-for-blind) for implementation details.
|
21 |
"""
|
22 |
|
23 |
# Define real-life scenario application
|