Spaces:
Sleeping
Sleeping
title: Accent Analyzer Agent | |
emoji: π’ | |
colorFrom: red | |
colorTo: red | |
sdk: docker | |
app_port: 8501 | |
tags: | |
- streamlit | |
pinned: false | |
short_description: Various english accent detection | |
license: mit | |
# Accent Analyzer | |
This is a Streamlit-based web application that analyzes the English accent in spoken videos. Users can provide a public video URL (MP4), receive a transcription of the speech using Whisper Base, and ask follow-up questions based on the transcript using Gemma3:1b. | |
## What It Does | |
- Accepts a public **MP4 video URL** | |
- Extracts audio and transcribes it using **OpenAI Whisper Base** | |
- Detects accent using a **Jzuluaga/accent-id-commonaccent_xlsr-en-english** model | |
- Lets users ask **follow-up questions** about the transcript using **Gemma3** | |
- Deploys easily on **Hugging Face Spaces** with CPU | |
--- | |
## Tech Stack | |
- **Streamlit** β UI | |
- **OpenAI Whisper (base)**: For speech-to-text transcription. | |
- **Jzuluaga/accent-id-commonaccent_xlsr-en-english**: For English accent classification. | |
- **Gemma3:1b via Ollama**: For generating answers to follow-up questions using context from the transcript. | |
- **Docker** β containerized for deployment | |
- **Hugging Face Spaces** β for hosting with CPU | |
--- | |
## Project Structure | |
``` | |
accent-analyzer/ | |
βββ Dockerfile # Container setup | |
βββ start.sh # Serving Ollama and app setup | |
βββ README.md # Instruction about the app | |
βββ requirements.txt # Python dependencies | |
βββ streamlit_app.py # Main UI app | |
βββ src/ | |
βββ custome_interface.py # SpeechBrain custom interface | |
βββ tools/ | |
β βββ accent_tool.py # Audio analysis tool | |
βββ app/ | |
βββ main_agent.py # Analysis + LLaMA agents | |
``` | |
--- | |
## Running Locally (GPU Required) | |
1. Clone the repo: | |
```bash | |
git clone https://huggingface.co/spaces/ash-171/accent-detection | |
cd accent-analyzer | |
``` | |
2. Build the Docker image: | |
```bash | |
docker build -t accent-analyzer . | |
``` | |
3. Run the container: | |
```bash | |
docker run --gpus all -p 8501:8501 accent-analyzer | |
``` | |
4. You can also run : `streamlit run streamlit_app.py` to deploy the app locally. | |
5. Visit: [http://localhost:8501](http://localhost:8501) | |
--- | |
## Requirements | |
`requirements.txt` should include at least: | |
``` | |
streamlit>=1.25.0 | |
requests==2.31.0 | |
pydub==0.25.1 | |
torch==1.11.0 | |
torchaudio==0.11.0 | |
speechbrain==0.5.12 | |
transformers==4.29.2 | |
asyncio==3.4.3 | |
ffmpeg-python==0.2.0 | |
openai-whisper==20230314 | |
numpy==1.22.4 | |
langchain>=0.1.0 | |
langchain-community>=0.0.30 | |
torchvision==0.12.0 | |
langgraph>=0.0.20 | |
``` | |
--- | |
## Notes | |
- Gemma3:1b is accessed via **Ollama** inside Docker β ensure it pulls on build. | |
- `custome_interface.py` is required by the accent model β itβs automatically downloaded in Dockerfile. | |
- Video URLs must be **direct links** to `.mp4` files. | |
--- | |
## Example Prompt | |
``` | |
Analyze this video: https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4 | |
``` | |
Then follow up with: | |
``` | |
Where is the speaker probably from? | |
What is the tone or emotion? | |
Summarize the video? | |
``` | |
--- | |
## Acknowledgments | |
This project uses the following models, frameworks, and tools: | |
- [OpenAI Whisper](https://github.com/openai/whisper): Automatic speech recognition model. | |
- [SpeechBrain](https://speechbrain.readthedocs.io/): Toolkit used for building and fine-tuning speech processing models. | |
- [Accent-ID CommonAccent](https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-en-english): Fine-tuned wav2vec2 model hosted on Hugging Face for English accent classification. | |
- [CustomEncoderWav2vec2Classifier](https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-en-english/blob/main/custom_interface.py): Custom interface used to load and run the accent model. | |
- [Gemma3:1b](https://ollama.com/library/gemma3:1b) via [Ollama](https://ollama.com): Large language model used for natural language follow-up based on transcripts. | |
- [Streamlit](https://streamlit.io): Python framework for building web applications. | |
- [Hugging Face Spaces](https://huggingface.co/spaces): Platform used for deploying this application on GPU infrastructure. | |
--- | |
## Note | |
Due to unavailability of GPU the app will be extremely slow. The output has been test in local system and verified. | |
--- | |
## Author | |
- Developed by [Aswathi T S](https://github.com/ash-171) | |
--- | |
## License | |
This project is licensed under the `MIT License`. | |