GPT-SoVITS
This is a mirror of the original weights for use with TTSDB.
Original weights: https://huggingface.co/lj1995/GPT-SoVITS Original code: https://github.com/RVC-Boss/GPT-SoVITS
GPT-SoVITS is a powerful few-shot voice conversion and text-to-speech system by RVC-Boss. It achieves high-quality voice cloning with just 1 minute of training data, supporting zero-shot and few-shot TTS with cross-lingual synthesis capabilities.
Original Work
This model was created by the original authors. Please cite their work if you use this model:
@misc{RVCBoss2024,
author = {RVC-Boss},
title = {GPT-SoVITS: 1 min voice data can also be used to train a good TTS model},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/RVC-Boss/GPT-SoVITS}},
}
Installation
pip install ttsdb-gpt-sovits
Usage
from ttsdb_gpt_sovits import GPTSoVITS
# Load the model (downloads weights automatically)
model = GPTSoVITS(model_id="ttsds/gpt-sovits")
# Synthesize speech
audio, sample_rate = model.synthesize(
text="Hello, this is a test of GPT-SoVITS.",
reference_audio="path/to/reference.wav",
text_reference="Transcript of the reference audio.",
language="eng",
)
# Save the output
model.save_audio(audio, sample_rate, "output.wav")
Model Details
| Property | Value |
|---|---|
| Sample Rate | 32000 Hz |
| Parameters | 167M |
| Architecture | Autoregressive, Non-Autoregressive, GPT, VITS |
| Languages | English, Chinese, Japanese |
| Release Date | 2024-01-16 |
Training Data
- Internal Dataset (2000 hours)
License
- Weights: MIT License
- Code: MIT License
Please refer to the original repositories for full license terms.
Links
- Original Code: https://github.com/RVC-Boss/GPT-SoVITS
- Original Weights: https://huggingface.co/lj1995/GPT-SoVITS
- TTSDB Package: ttsdb-gpt-sovits
- TTSDB GitHub: https://github.com/ttsds/ttsdb
Model tree for ttsds/gpt-sovits
Base model
lj1995/GPT-SoVITS