File size: 10,372 Bytes
5facf75 15a1f73 5facf75 15a1f73 5facf75 15a1f73 5facf75 15a1f73 5facf75 15a1f73 5facf75 15a1f73 5facf75 15a1f73 5facf75 15a1f73 5facf75 15a1f73 5facf75 15a1f73 5facf75 15a1f73 5facf75 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 |
---
title: REM Waste English Accent Analyzer
emoji: ποΈ
colorFrom: purple
colorTo: pink
sdk: docker
app_file: app.py # This is often included for clarity, but not strictly used by Docker SDK for execution path
# The port your Docker container exposes. Hugging Face Spaces expects 7860.
port: 7860
---
# REM Waste - English Accent Analyzer
This is a web-based application for analyzing English accents from video URLs. It leverages `yt-dlp` for video downloading, `FFmpeg` for audio extraction, and `SpeechBrain` for accent classification.
## Project Overview
The **REM Waste - English Accent Analyzer** is a web-based application built with Flask that allows users to analyze the English accent of a speaker from a provided video URL. It leverages robust open-source tools like `yt-dlp` for video downloading, `FFmpeg` for audio extraction, and `SpeechBrain` for advanced English accent classification. The application is designed to be user-friendly, providing clear results and a smooth experience through a responsive web interface.
This tool is particularly useful for:
* Language learners and educators to get feedback on accent.
* Researchers studying speech and accents.
* Content creators to understand their audience's accent distribution.
* Anyone curious about the English accent in a video.
## Features
* **Video Download:** Supports downloading videos from various public platforms, including YouTube, Loom, and direct MP4 links, using `yt-dlp`.
* **Audio Extraction:** Extracts high-quality audio (16kHz, mono WAV) from the downloaded video using `FFmpeg`.
* **English Accent Classification:** Utilizes a pre-trained `SpeechBrain` model (`Jzuluaga/accent-id-commonaccent_ecapa`) to classify the English accent present in the audio.
* **Supported Accents:** The model is trained to recognize **16 distinct English accents**, including:
* US
* England
* Australia
* Indian
* Canada
* Bermuda
* Scotland
* African
* Ireland
* New Zealand
* Wales
* Malaysia
* Philippines
* Singapore
* Hong Kong
* South Atlantic
* **Confidence Score:** Provides a confidence percentage for the detected accent.
* **Asynchronous Processing:** Uses `Flask-Executor` to handle long-running tasks (video download, audio extraction, accent analysis) in the background, keeping the UI responsive.
* **Responsive UI:** A clean and modern web interface built with HTML and Tailwind CSS, ensuring usability across various devices.
* **Temporary File Management:** Automatically creates and manages temporary directories for video and audio files, with robust cleanup mechanisms.
## Project Structure
rem_waste_accent_analyzer/
βββ app.py # Main Flask application, handles routes, background tasks, and orchestrates modules.
βββ video_processing.py # Module for video downloading (yt-dlp) and audio extraction (FFmpeg).
βββ accent_analysis.py # Module for SpeechBrain model loading and accent detection logic.
βββ templates/
β βββ index.html # HTML template for the web interface.
βββ static/
βββ style.css # Custom CSS for styling the UI (uses Tailwind CSS).
## Setup Instructions
Follow these steps to set up and run the application on your local machine.
### Prerequisites
Before you begin, ensure you have the following installed:
* **Python 3.8+**: Download from [python.org](https://www.python.org/downloads/).
* **FFmpeg**: A powerful multimedia framework required for audio extraction.
* **Windows**: Download a static build from [ffmpeg.org/download.html](https://ffmpeg.org/download.html). Extract it and add the `bin` directory to your system's `PATH` environment variable.
* **macOS**: Install via Homebrew: `brew install ffmpeg`
* **Linux (Ubuntu/Debian)**: `sudo apt update && sudo apt install ffmpeg`
* **yt-dlp**: A command-line program to download videos. It will be installed via `pip` but relies on `FFmpeg`.
### Installation Steps
1. **Clone the Repository (or create the project structure manually):**
If you have the project files already, navigate to your project's root directory. Otherwise, create the `rem_waste_accent_analyzer` folder and the `templates/` and `static/` subfolders as shown in the Project Structure.
2. **Navigate to the Project Directory:**
Open your terminal or command prompt and change to your project's root directory:
```bash
cd path\to\rem_waste_accent_analyzer
```
(Replace `path\to\rem_waste_accent_analyzer` with your actual path)
3. **Create and Activate a Python Virtual Environment (Highly Recommended):**
A virtual environment isolates your project's dependencies, preventing conflicts with other Python projects.
```bash
python -m venv myenv
```
* **On Windows:**
```bash
.\myenv\Scripts\activate
```
* **On macOS/Linux:**
```bash
source myenv/bin/activate
```
You should see `(myenv)` at the beginning of your terminal prompt, indicating the virtual environment is active.
4. **Install Python Dependencies:**
With your virtual environment activated, install all required Python libraries. This step is crucial for resolving potential version compatibility issues.
```bash
# Uninstall existing versions for a clean slate (important!)
pip uninstall speechbrain transformers torchaudio huggingface_hub numpy scipy tqdm Flask Flask-Executor yt-dlp -y
# Install the latest compatible versions
pip install --upgrade speechbrain transformers torchaudio huggingface_hub numpy scipy tqdm Flask Flask-Executor yt-dlp
```
* **Note on `UserWarning`:** You might see a `UserWarning: Requested Pretrainer collection using symlinks on Windows...` during model loading. This is an informational message from SpeechBrain/PyTorch/HuggingFace about internal file handling and can generally be ignored as it does not prevent the application from functioning.
5. **Manual Hugging Face Cache Cleanup (Optional, if issues persist):**
If you continue to face model loading errors after step 4, you might need to manually clear the Hugging Face cache.
* Delete the entire folder at `D:\Accent Detection\rem_waste\.hf_cache` (or wherever your `HF_HOME` environment variable points to within your project).
* Then, try running the application again. This will force a fresh download of the model files.
## Usage
1. **Run the Flask Application:**
Ensure your virtual environment is active, then run the main application file:
```bash
python app.py
```
You will see output in your terminal indicating the Flask server is running, typically on `http://127.0.0.1:5000/`.
2. **Access the Web Interface:**
Open your web browser and navigate to the address provided by Flask (e.g., `http://127.0.0.1:5000/`).
3. **Analyze an Accent:**
* Enter a public video URL (e.g., a YouTube video link, a Loom link, or a direct link to an MP4 file) into the "Video URL" input field.
* Click the "Analyze Accent" button.
* The application will display a status message ("Initiating analysis...", "Still processing...").
* Once the analysis is complete, the detected English accent, a confidence score, and a brief summary will appear on the page.
## Error Handling & Troubleshooting
* **"Video download failed: yt-dlp failed: ERROR: unable to open for writing: [Errno 2] No such file or directory..."**:
* This usually indicates `yt-dlp` cannot write to the temporary directory.
* **Solution:** Ensure the `rem_waste_accent_analyzer` folder and its `temp_files` subdirectory have full write permissions for your user account. The `subprocess.run` with `cwd` set in `video_processing.py` is designed to mitigate this, but underlying OS permissions can still interfere. Running your terminal/command prompt as Administrator might temporarily resolve this for testing.
* **"Error opening 'D:\...\\audio_...wav': System error." (during accent detection):**
* This indicates `SpeechBrain` is having trouble accessing the audio file.
* **Solution:** This was addressed by converting the path to a relative path (`os.path.relpath`) before passing it to `detect_accent`. Ensure your `app.py` and `accent_analysis.py` files are updated to the latest versions provided in the previous responses.
* **"Error loading SpeechBrain model: No huggingface_hub attribute cached_download" or "There is no such class as speechbrain.lobes.models.huggingface_wav2vec.HuggingFaceWav2Vec2"**:
* These are version compatibility issues between SpeechBrain and its dependencies.
* **Solution:** Follow the "Install Python Dependencies" step (Step 4) very carefully, including the `pip uninstall` command for a clean installation. If it persists, try the "Manual Hugging Face Cache Cleanup" (Step 5).
* **"Analysis completed successfully!" but no results on webpage:**
* This means the backend is working, but the frontend isn't displaying the data.
* **Solution:** Ensure your `templates/index.html` file includes the latest `showResults` function as provided, which explicitly removes the `hidden` class and sets `style.display = 'block'` for the results container. Check your browser's developer console (F12) for any JavaScript errors or the `console.log("Received data from backend:", data);` output to see the exact data structure.
## Technologies Used
* **Backend:**
* [Flask](https://flask.palletsprojects.com/): Python web framework.
* [Flask-Executor](https://flask-executor.readthedocs.io/): For running background tasks.
* [yt-dlp](https://github.com/yt-dlp/yt-dlp): Video downloading.
* [FFmpeg](https://ffmpeg.org/): Audio extraction and conversion.
* [SpeechBrain](https://speechbrain.github.io/): Open-source speech toolkit for accent classification.
* [PyTorch](https://pytorch.org/): Deep learning framework (underpins SpeechBrain).
* [Hugging Face Hub](https://huggingface.co/): For hosting and accessing pre-trained models.
* **Frontend:**
* HTML5
* [Tailwind CSS](https://tailwindcss.com/): Utility-first CSS framework for rapid UI development.
* JavaScript (Fetch API for AJAX, DOM manipulation).
## License
This project is open-source and available under the [MIT License](https://opensource.org/licenses/MIT).
|