Spaces:

tasmimulhuda
/

english-accent-detection-from-video-url

Sleeping

File size: 10,372 Bytes

5facf75
 
 
 
 
 
 
 
 
 
15a1f73
5facf75
15a1f73
5facf75
15a1f73
5facf75
15a1f73
5facf75
15a1f73
5facf75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15a1f73
 
 
 
 
 
 
 
 
5facf75
15a1f73
 
5facf75
15a1f73
5facf75
15a1f73
5facf75
15a1f73
5facf75
15a1f73
5facf75

---
title: REM Waste English Accent Analyzer
emoji: 🎙️
colorFrom: purple
colorTo: pink
sdk: docker
app_file: app.py # This is often included for clarity, but not strictly used by Docker SDK for execution path
# The port your Docker container exposes. Hugging Face Spaces expects 7860.
port: 7860
---

# REM Waste - English Accent Analyzer

This is a web-based application for analyzing English accents from video URLs. It leverages `yt-dlp` for video downloading, `FFmpeg` for audio extraction, and `SpeechBrain` for accent classification.

## Project Overview

The **REM Waste - English Accent Analyzer** is a web-based application built with Flask that allows users to analyze the English accent of a speaker from a provided video URL. It leverages robust open-source tools like `yt-dlp` for video downloading, `FFmpeg` for audio extraction, and `SpeechBrain` for advanced English accent classification. The application is designed to be user-friendly, providing clear results and a smooth experience through a responsive web interface.

This tool is particularly useful for:
* Language learners and educators to get feedback on accent.
* Researchers studying speech and accents.
* Content creators to understand their audience's accent distribution.
* Anyone curious about the English accent in a video.

## Features

* **Video Download:** Supports downloading videos from various public platforms, including YouTube, Loom, and direct MP4 links, using `yt-dlp`.
* **Audio Extraction:** Extracts high-quality audio (16kHz, mono WAV) from the downloaded video using `FFmpeg`.
* **English Accent Classification:** Utilizes a pre-trained `SpeechBrain` model (`Jzuluaga/accent-id-commonaccent_ecapa`) to classify the English accent present in the audio.
* **Supported Accents:** The model is trained to recognize **16 distinct English accents**, including:
    * US
    * England
    * Australia
    * Indian
    * Canada
    * Bermuda
    * Scotland
    * African
    * Ireland
    * New Zealand
    * Wales
    * Malaysia
    * Philippines
    * Singapore
    * Hong Kong
    * South Atlantic
* **Confidence Score:** Provides a confidence percentage for the detected accent.
* **Asynchronous Processing:** Uses `Flask-Executor` to handle long-running tasks (video download, audio extraction, accent analysis) in the background, keeping the UI responsive.
* **Responsive UI:** A clean and modern web interface built with HTML and Tailwind CSS, ensuring usability across various devices.
* **Temporary File Management:** Automatically creates and manages temporary directories for video and audio files, with robust cleanup mechanisms.

## Project Structure


rem_waste_accent_analyzer/
├── app.py                  # Main Flask application, handles routes, background tasks, and orchestrates modules.
├── video_processing.py     # Module for video downloading (yt-dlp) and audio extraction (FFmpeg).
├── accent_analysis.py      # Module for SpeechBrain model loading and accent detection logic.
├── templates/
│   └── index.html          # HTML template for the web interface.
└── static/
└── style.css           # Custom CSS for styling the UI (uses Tailwind CSS).


## Setup Instructions

Follow these steps to set up and run the application on your local machine.

### Prerequisites

Before you begin, ensure you have the following installed:

* **Python 3.8+**: Download from [python.org](https://www.python.org/downloads/).
* **FFmpeg**: A powerful multimedia framework required for audio extraction.
    * **Windows**: Download a static build from [ffmpeg.org/download.html](https://ffmpeg.org/download.html). Extract it and add the `bin` directory to your system's `PATH` environment variable.
    * **macOS**: Install via Homebrew: `brew install ffmpeg`
    * **Linux (Ubuntu/Debian)**: `sudo apt update && sudo apt install ffmpeg`
* **yt-dlp**: A command-line program to download videos. It will be installed via `pip` but relies on `FFmpeg`.

### Installation Steps

1.  **Clone the Repository (or create the project structure manually):**
    If you have the project files already, navigate to your project's root directory. Otherwise, create the `rem_waste_accent_analyzer` folder and the `templates/` and `static/` subfolders as shown in the Project Structure.

2.  **Navigate to the Project Directory:**
    Open your terminal or command prompt and change to your project's root directory:
    ```bash
    cd path\to\rem_waste_accent_analyzer
    ```
    (Replace `path\to\rem_waste_accent_analyzer` with your actual path)

3.  **Create and Activate a Python Virtual Environment (Highly Recommended):**
    A virtual environment isolates your project's dependencies, preventing conflicts with other Python projects.

    ```bash
    python -m venv myenv
    ```
    * **On Windows:**
        ```bash
        .\myenv\Scripts\activate
        ```
    * **On macOS/Linux:**
        ```bash
        source myenv/bin/activate
        ```
    You should see `(myenv)` at the beginning of your terminal prompt, indicating the virtual environment is active.

4.  **Install Python Dependencies:**
    With your virtual environment activated, install all required Python libraries. This step is crucial for resolving potential version compatibility issues.

    ```bash
    # Uninstall existing versions for a clean slate (important!)
    pip uninstall speechbrain transformers torchaudio huggingface_hub numpy scipy tqdm Flask Flask-Executor yt-dlp -y

    # Install the latest compatible versions
    pip install --upgrade speechbrain transformers torchaudio huggingface_hub numpy scipy tqdm Flask Flask-Executor yt-dlp
    ```
    * **Note on `UserWarning`:** You might see a `UserWarning: Requested Pretrainer collection using symlinks on Windows...` during model loading. This is an informational message from SpeechBrain/PyTorch/HuggingFace about internal file handling and can generally be ignored as it does not prevent the application from functioning.

5.  **Manual Hugging Face Cache Cleanup (Optional, if issues persist):**
    If you continue to face model loading errors after step 4, you might need to manually clear the Hugging Face cache.
    * Delete the entire folder at `D:\Accent Detection\rem_waste\.hf_cache` (or wherever your `HF_HOME` environment variable points to within your project).
    * Then, try running the application again. This will force a fresh download of the model files.

## Usage

1.  **Run the Flask Application:**
    Ensure your virtual environment is active, then run the main application file:
    ```bash
    python app.py
    ```
    You will see output in your terminal indicating the Flask server is running, typically on `http://127.0.0.1:5000/`.

2.  **Access the Web Interface:**
    Open your web browser and navigate to the address provided by Flask (e.g., `http://127.0.0.1:5000/`).

3.  **Analyze an Accent:**
    * Enter a public video URL (e.g., a YouTube video link, a Loom link, or a direct link to an MP4 file) into the "Video URL" input field.
    * Click the "Analyze Accent" button.
    * The application will display a status message ("Initiating analysis...", "Still processing...").
    * Once the analysis is complete, the detected English accent, a confidence score, and a brief summary will appear on the page.

## Error Handling & Troubleshooting

* **"Video download failed: yt-dlp failed: ERROR: unable to open for writing: [Errno 2] No such file or directory..."**:
    * This usually indicates `yt-dlp` cannot write to the temporary directory.
    * **Solution:** Ensure the `rem_waste_accent_analyzer` folder and its `temp_files` subdirectory have full write permissions for your user account. The `subprocess.run` with `cwd` set in `video_processing.py` is designed to mitigate this, but underlying OS permissions can still interfere. Running your terminal/command prompt as Administrator might temporarily resolve this for testing.
* **"Error opening 'D:\...\\audio_...wav': System error." (during accent detection):**
    * This indicates `SpeechBrain` is having trouble accessing the audio file.
    * **Solution:** This was addressed by converting the path to a relative path (`os.path.relpath`) before passing it to `detect_accent`. Ensure your `app.py` and `accent_analysis.py` files are updated to the latest versions provided in the previous responses.
* **"Error loading SpeechBrain model: No huggingface_hub attribute cached_download" or "There is no such class as speechbrain.lobes.models.huggingface_wav2vec.HuggingFaceWav2Vec2"**:
    * These are version compatibility issues between SpeechBrain and its dependencies.
    * **Solution:** Follow the "Install Python Dependencies" step (Step 4) very carefully, including the `pip uninstall` command for a clean installation. If it persists, try the "Manual Hugging Face Cache Cleanup" (Step 5).
* **"Analysis completed successfully!" but no results on webpage:**
    * This means the backend is working, but the frontend isn't displaying the data.
    * **Solution:** Ensure your `templates/index.html` file includes the latest `showResults` function as provided, which explicitly removes the `hidden` class and sets `style.display = 'block'` for the results container. Check your browser's developer console (F12) for any JavaScript errors or the `console.log("Received data from backend:", data);` output to see the exact data structure.

## Technologies Used

* **Backend:**
    * [Flask](https://flask.palletsprojects.com/): Python web framework.
    * [Flask-Executor](https://flask-executor.readthedocs.io/): For running background tasks.
    * [yt-dlp](https://github.com/yt-dlp/yt-dlp): Video downloading.
    * [FFmpeg](https://ffmpeg.org/): Audio extraction and conversion.
    * [SpeechBrain](https://speechbrain.github.io/): Open-source speech toolkit for accent classification.
    * [PyTorch](https://pytorch.org/): Deep learning framework (underpins SpeechBrain).
    * [Hugging Face Hub](https://huggingface.co/): For hosting and accessing pre-trained models.
* **Frontend:**
    * HTML5
    * [Tailwind CSS](https://tailwindcss.com/): Utility-first CSS framework for rapid UI development.
    * JavaScript (Fetch API for AJAX, DOM manipulation).

## License

This project is open-source and available under the [MIT License](https://opensource.org/licenses/MIT).