tasmimulhuda's picture
readme update
5facf75
---
title: REM Waste English Accent Analyzer
emoji: πŸŽ™οΈ
colorFrom: purple
colorTo: pink
sdk: docker
app_file: app.py # This is often included for clarity, but not strictly used by Docker SDK for execution path
# The port your Docker container exposes. Hugging Face Spaces expects 7860.
port: 7860
---
# REM Waste - English Accent Analyzer
This is a web-based application for analyzing English accents from video URLs. It leverages `yt-dlp` for video downloading, `FFmpeg` for audio extraction, and `SpeechBrain` for accent classification.
## Project Overview
The **REM Waste - English Accent Analyzer** is a web-based application built with Flask that allows users to analyze the English accent of a speaker from a provided video URL. It leverages robust open-source tools like `yt-dlp` for video downloading, `FFmpeg` for audio extraction, and `SpeechBrain` for advanced English accent classification. The application is designed to be user-friendly, providing clear results and a smooth experience through a responsive web interface.
This tool is particularly useful for:
* Language learners and educators to get feedback on accent.
* Researchers studying speech and accents.
* Content creators to understand their audience's accent distribution.
* Anyone curious about the English accent in a video.
## Features
* **Video Download:** Supports downloading videos from various public platforms, including YouTube, Loom, and direct MP4 links, using `yt-dlp`.
* **Audio Extraction:** Extracts high-quality audio (16kHz, mono WAV) from the downloaded video using `FFmpeg`.
* **English Accent Classification:** Utilizes a pre-trained `SpeechBrain` model (`Jzuluaga/accent-id-commonaccent_ecapa`) to classify the English accent present in the audio.
* **Supported Accents:** The model is trained to recognize **16 distinct English accents**, including:
* US
* England
* Australia
* Indian
* Canada
* Bermuda
* Scotland
* African
* Ireland
* New Zealand
* Wales
* Malaysia
* Philippines
* Singapore
* Hong Kong
* South Atlantic
* **Confidence Score:** Provides a confidence percentage for the detected accent.
* **Asynchronous Processing:** Uses `Flask-Executor` to handle long-running tasks (video download, audio extraction, accent analysis) in the background, keeping the UI responsive.
* **Responsive UI:** A clean and modern web interface built with HTML and Tailwind CSS, ensuring usability across various devices.
* **Temporary File Management:** Automatically creates and manages temporary directories for video and audio files, with robust cleanup mechanisms.
## Project Structure
rem_waste_accent_analyzer/
β”œβ”€β”€ app.py # Main Flask application, handles routes, background tasks, and orchestrates modules.
β”œβ”€β”€ video_processing.py # Module for video downloading (yt-dlp) and audio extraction (FFmpeg).
β”œβ”€β”€ accent_analysis.py # Module for SpeechBrain model loading and accent detection logic.
β”œβ”€β”€ templates/
β”‚ └── index.html # HTML template for the web interface.
└── static/
└── style.css # Custom CSS for styling the UI (uses Tailwind CSS).
## Setup Instructions
Follow these steps to set up and run the application on your local machine.
### Prerequisites
Before you begin, ensure you have the following installed:
* **Python 3.8+**: Download from [python.org](https://www.python.org/downloads/).
* **FFmpeg**: A powerful multimedia framework required for audio extraction.
* **Windows**: Download a static build from [ffmpeg.org/download.html](https://ffmpeg.org/download.html). Extract it and add the `bin` directory to your system's `PATH` environment variable.
* **macOS**: Install via Homebrew: `brew install ffmpeg`
* **Linux (Ubuntu/Debian)**: `sudo apt update && sudo apt install ffmpeg`
* **yt-dlp**: A command-line program to download videos. It will be installed via `pip` but relies on `FFmpeg`.
### Installation Steps
1. **Clone the Repository (or create the project structure manually):**
If you have the project files already, navigate to your project's root directory. Otherwise, create the `rem_waste_accent_analyzer` folder and the `templates/` and `static/` subfolders as shown in the Project Structure.
2. **Navigate to the Project Directory:**
Open your terminal or command prompt and change to your project's root directory:
```bash
cd path\to\rem_waste_accent_analyzer
```
(Replace `path\to\rem_waste_accent_analyzer` with your actual path)
3. **Create and Activate a Python Virtual Environment (Highly Recommended):**
A virtual environment isolates your project's dependencies, preventing conflicts with other Python projects.
```bash
python -m venv myenv
```
* **On Windows:**
```bash
.\myenv\Scripts\activate
```
* **On macOS/Linux:**
```bash
source myenv/bin/activate
```
You should see `(myenv)` at the beginning of your terminal prompt, indicating the virtual environment is active.
4. **Install Python Dependencies:**
With your virtual environment activated, install all required Python libraries. This step is crucial for resolving potential version compatibility issues.
```bash
# Uninstall existing versions for a clean slate (important!)
pip uninstall speechbrain transformers torchaudio huggingface_hub numpy scipy tqdm Flask Flask-Executor yt-dlp -y
# Install the latest compatible versions
pip install --upgrade speechbrain transformers torchaudio huggingface_hub numpy scipy tqdm Flask Flask-Executor yt-dlp
```
* **Note on `UserWarning`:** You might see a `UserWarning: Requested Pretrainer collection using symlinks on Windows...` during model loading. This is an informational message from SpeechBrain/PyTorch/HuggingFace about internal file handling and can generally be ignored as it does not prevent the application from functioning.
5. **Manual Hugging Face Cache Cleanup (Optional, if issues persist):**
If you continue to face model loading errors after step 4, you might need to manually clear the Hugging Face cache.
* Delete the entire folder at `D:\Accent Detection\rem_waste\.hf_cache` (or wherever your `HF_HOME` environment variable points to within your project).
* Then, try running the application again. This will force a fresh download of the model files.
## Usage
1. **Run the Flask Application:**
Ensure your virtual environment is active, then run the main application file:
```bash
python app.py
```
You will see output in your terminal indicating the Flask server is running, typically on `http://127.0.0.1:5000/`.
2. **Access the Web Interface:**
Open your web browser and navigate to the address provided by Flask (e.g., `http://127.0.0.1:5000/`).
3. **Analyze an Accent:**
* Enter a public video URL (e.g., a YouTube video link, a Loom link, or a direct link to an MP4 file) into the "Video URL" input field.
* Click the "Analyze Accent" button.
* The application will display a status message ("Initiating analysis...", "Still processing...").
* Once the analysis is complete, the detected English accent, a confidence score, and a brief summary will appear on the page.
## Error Handling & Troubleshooting
* **"Video download failed: yt-dlp failed: ERROR: unable to open for writing: [Errno 2] No such file or directory..."**:
* This usually indicates `yt-dlp` cannot write to the temporary directory.
* **Solution:** Ensure the `rem_waste_accent_analyzer` folder and its `temp_files` subdirectory have full write permissions for your user account. The `subprocess.run` with `cwd` set in `video_processing.py` is designed to mitigate this, but underlying OS permissions can still interfere. Running your terminal/command prompt as Administrator might temporarily resolve this for testing.
* **"Error opening 'D:\...\\audio_...wav': System error." (during accent detection):**
* This indicates `SpeechBrain` is having trouble accessing the audio file.
* **Solution:** This was addressed by converting the path to a relative path (`os.path.relpath`) before passing it to `detect_accent`. Ensure your `app.py` and `accent_analysis.py` files are updated to the latest versions provided in the previous responses.
* **"Error loading SpeechBrain model: No huggingface_hub attribute cached_download" or "There is no such class as speechbrain.lobes.models.huggingface_wav2vec.HuggingFaceWav2Vec2"**:
* These are version compatibility issues between SpeechBrain and its dependencies.
* **Solution:** Follow the "Install Python Dependencies" step (Step 4) very carefully, including the `pip uninstall` command for a clean installation. If it persists, try the "Manual Hugging Face Cache Cleanup" (Step 5).
* **"Analysis completed successfully!" but no results on webpage:**
* This means the backend is working, but the frontend isn't displaying the data.
* **Solution:** Ensure your `templates/index.html` file includes the latest `showResults` function as provided, which explicitly removes the `hidden` class and sets `style.display = 'block'` for the results container. Check your browser's developer console (F12) for any JavaScript errors or the `console.log("Received data from backend:", data);` output to see the exact data structure.
## Technologies Used
* **Backend:**
* [Flask](https://flask.palletsprojects.com/): Python web framework.
* [Flask-Executor](https://flask-executor.readthedocs.io/): For running background tasks.
* [yt-dlp](https://github.com/yt-dlp/yt-dlp): Video downloading.
* [FFmpeg](https://ffmpeg.org/): Audio extraction and conversion.
* [SpeechBrain](https://speechbrain.github.io/): Open-source speech toolkit for accent classification.
* [PyTorch](https://pytorch.org/): Deep learning framework (underpins SpeechBrain).
* [Hugging Face Hub](https://huggingface.co/): For hosting and accessing pre-trained models.
* **Frontend:**
* HTML5
* [Tailwind CSS](https://tailwindcss.com/): Utility-first CSS framework for rapid UI development.
* JavaScript (Fetch API for AJAX, DOM manipulation).
## License
This project is open-source and available under the [MIT License](https://opensource.org/licenses/MIT).