--- title: Accent Analyzer Agent emoji: 🏒 colorFrom: red colorTo: red sdk: docker app_port: 8501 tags: - streamlit pinned: false short_description: Various english accent detection license: mit --- # Accent Analyzer This is a Streamlit-based web application that analyzes the English accent in spoken videos. Users can provide a public video URL (MP4), receive a transcription of the speech using Whisper Base, and ask follow-up questions based on the transcript using Gemma3:1b. ## What It Does - Accepts a public **MP4 video URL** - Extracts audio and transcribes it using **OpenAI Whisper Base** - Detects accent using a **Jzuluaga/accent-id-commonaccent_xlsr-en-english** model - Lets users ask **follow-up questions** about the transcript using **Gemma3** - Deploys easily on **Hugging Face Spaces** with CPU --- ## Tech Stack - **Streamlit** β€” UI - **OpenAI Whisper (base)**: For speech-to-text transcription. - **Jzuluaga/accent-id-commonaccent_xlsr-en-english**: For English accent classification. - **Gemma3:1b via Ollama**: For generating answers to follow-up questions using context from the transcript. - **Docker** β€” containerized for deployment - **Hugging Face Spaces** β€” for hosting with CPU --- ## Project Structure ``` accent-analyzer/ β”œβ”€β”€ Dockerfile # Container setup β”œβ”€β”€ start.sh # Serving Ollama and app setup β”œβ”€β”€ README.md # Instruction about the app β”œβ”€β”€ requirements.txt # Python dependencies β”œβ”€β”€ streamlit_app.py # Main UI app └── src/ β”œβ”€β”€ custome_interface.py # SpeechBrain custom interface β”œβ”€β”€ tools/ β”‚ └── accent_tool.py # Audio analysis tool └── app/ └── main_agent.py # Analysis + LLaMA agents ``` --- ## Running Locally (GPU Required) 1. Clone the repo: ```bash git clone https://huggingface.co/spaces/ash-171/accent-detection cd accent-analyzer ``` 2. Build the Docker image: ```bash docker build -t accent-analyzer . ``` 3. Run the container: ```bash docker run --gpus all -p 8501:8501 accent-analyzer ``` 4. You can also run : `streamlit run streamlit_app.py` to deploy the app locally. 5. Visit: [http://localhost:8501](http://localhost:8501) --- ## Requirements `requirements.txt` should include at least: ``` streamlit>=1.25.0 requests==2.31.0 pydub==0.25.1 torch==1.11.0 torchaudio==0.11.0 speechbrain==0.5.12 transformers==4.29.2 asyncio==3.4.3 ffmpeg-python==0.2.0 openai-whisper==20230314 numpy==1.22.4 langchain>=0.1.0 langchain-community>=0.0.30 torchvision==0.12.0 langgraph>=0.0.20 ``` --- ## Notes - Gemma3:1b is accessed via **Ollama** inside Docker β€” ensure it pulls on build. - `custome_interface.py` is required by the accent model β€” it’s automatically downloaded in Dockerfile. - Video URLs must be **direct links** to `.mp4` files. --- ## Example Prompt ``` Analyze this video: https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4 ``` Then follow up with: ``` Where is the speaker probably from? What is the tone or emotion? Summarize the video? ``` --- ## Acknowledgments This project uses the following models, frameworks, and tools: - [OpenAI Whisper](https://github.com/openai/whisper): Automatic speech recognition model. - [SpeechBrain](https://speechbrain.readthedocs.io/): Toolkit used for building and fine-tuning speech processing models. - [Accent-ID CommonAccent](https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-en-english): Fine-tuned wav2vec2 model hosted on Hugging Face for English accent classification. - [CustomEncoderWav2vec2Classifier](https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-en-english/blob/main/custom_interface.py): Custom interface used to load and run the accent model. - [Gemma3:1b](https://ollama.com/library/gemma3:1b) via [Ollama](https://ollama.com): Large language model used for natural language follow-up based on transcripts. - [Streamlit](https://streamlit.io): Python framework for building web applications. - [Hugging Face Spaces](https://huggingface.co/spaces): Platform used for deploying this application on GPU infrastructure. --- ## Note Due to unavailability of GPU the app will be extremely slow. The output has been test in local system and verified. --- ## Author - Developed by [Aswathi T S](https://github.com/ash-171) --- ## License This project is licensed under the `MIT License`.