Spaces:
Runtime error
Runtime error
File size: 3,259 Bytes
865ef46 8970226 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
---
title: Voice Clonning
emoji: "π£οΈ"
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: "3.0"
app_file: app.py
---
# Voice Clonning
This Space allows users to clone voices using a pre-trained model. Upload a reference audio file, type your text, and hear the result!
**Usage Instructions:**
1. Upload your reference voice file
2. Enter text to synthesize
3. Click **Submit** and listen to the cloned voice output
**Notes:**
β Requires moderate CPU.
β For faster performance, consider toggling GPU under Settings.
# XTTS v2 Voice Cloning Demo (Coqui TTS)
This demo clones a speaker's voice from a short reference sample and synthesizes text in multiple languages using the XTTS v2 model.
Contents:
- `clone_voice.py` β CLI script to run voice cloning
Requirements:
- Python 3.9β3.11 recommended
- Windows, macOS, or Linux
## 1) Setup (recommended: virtual environment)
Windows (PowerShell):
```
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
```
macOS/Linux (bash):
```
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
```
### CPU-only install
```
pip install TTS
```
### GPU (CUDA) install (Windows/Linux)
1) Install a CUDA-enabled PyTorch build compatible with your CUDA version. Example for CUDA 12.1:
```
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
```
2) Then install Coqui TTS:
```
pip install TTS
```
3) Verify CUDA availability (optional):
```
python - << "PY"
import torch
print("CUDA available:", torch.cuda.is_available())
PY
```
If the above prints False but you expected True, you likely installed a CPU-only PyTorch or mismatched CUDA build.
## 2) Prepare a reference voice sample
- Short clip: 6β15 seconds is usually enough.
- Clean speech, minimal background noise, no music.
- Mono WAV (16β48 kHz recommended). Many formats work, but WAV is safest.
- Place the file in this folder, e.g., `reference_voice.wav`.
## 3) Run the demo
From this `demotask` directory:
CPU:
```
python clone_voice.py --text "Ok signore, l'ho completato e qui ci sono i file WAV di riferimento." --speaker_wav "reference1.wav" --language it --output "output_it.wav" --device cpu
On first run, the model `tts_models/multilingual/multi-dataset/xtts_v2` will be downloaded automatically. The result is saved as `output.wav`.
Common language codes: `en`, `it`, `es`, `fr`, `de`, `pt`, `pl`, `nl`, `tr`, `ru`, `zh`, `ja`, `ko`.
## 4) Troubleshooting
- CUDA not used: Ensure you installed a CUDA-enabled PyTorch (see above) and your GPU drivers/CUDA runtime are installed. Then use `--device cuda`.
- Out of memory (OOM): Try CPU mode or shorter text; ensure no other GPU-heavy apps are running.
- Reference file not found: Check the `--speaker_wav` path.
- Bad audio quality: Use a cleaner/longer reference sample, reduce background noise, and avoid clipping. Try 16 kHz or 22.05/24/44.1 kHz mono WAV.
- Slow on CPU: This is expected. GPU is recommended for speed.
## 5) Notes
- This script auto-selects CUDA if available when `--device` is not provided.
- For repeatable environments, consider pinning versions in a `requirements.txt`.
- Model: `tts_models/multilingual/multi-dataset/xtts_v2`.
|