Spaces:

Pheire
/

accent-detector

Sleeping

File size: 2,456 Bytes


# 🗣️ Accent Identifier

This tool identifies the **speaker's accent** from a video or audio input. It supports uploads and URLs — including **direct `.mp4` links**, **Loom videos**, and **YouTube-style links** — and uses a deep learning model from [SpeechBrain](https://speechbrain.readthedocs.io/en/latest/index.html) for inference.

## 🚀 Demo

Try it out live on [Hugging Face Spaces](https://pheire-accent-detector.hf.space) *(replace with your actual link)*.

---

## 📦 Features

* 🎥 Accepts video/audio uploads (`.mp4`, `.wav`, `.mp3`)
* 🌐 Handles direct URLs (e.g. Loom, direct `.mp4`, YouTube)
* 🧠 Classifies accent using `speechbrain` pretrained model
* 📊 Returns top prediction and top-3 probabilities
* ⚡ Fast and easy UI built with [Gradio](https://gradio.app)

---

## 🧪 Example Inputs

* `https://www.loom.com/share/abc123`
* `https://yourdomain.com/sample.mp4`
* Uploaded audio: `voice_sample.wav`

---

## 🛠️ Installation

```bash
git clone https://github.com/yourusername/accent-identifier.git
cd accent-identifier

# Create virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

# Install dependencies
pip install -r requirements.txt
```

### requirements.txt

```
speechbrain
gradio
torchaudio
torch
ffmpeg-python
yt-dlp
requests
```

Make sure `ffmpeg` is installed and available in your system path.
You can test with: `ffmpeg -version`

---

## ▶️ Run Locally

```bash
python app.py
```

This will launch a Gradio interface in your browser at `http://localhost:7860`.

---

## 🧠 Model Details

* **Model**: `Jzuluaga/accent-id-commonaccent_ecapa`
* **Framework**: [SpeechBrain](https://speechbrain.readthedocs.io/)
* **Classes**: US, UK, Australia, Canada, India, etc.

---

## 📂 Project Structure

```
accent-identifier/
├── app.py               # Main Gradio app
├── requirements.txt     # Dependencies
└── README.md            # You are here
```

---

## 🧩 Notes

* Loom support relies on their internal API. It may break if they change the endpoint.
* Audio is extracted to `.wav` using `ffmpeg` with 16kHz mono format for model compatibility.




---
title: Accent Detector
emoji: 🏢
colorFrom: blue
colorTo: blue
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference