GPT-SoVITS

This is a mirror of the original weights for use with TTSDB.

Original weights: https://huggingface.co/lj1995/GPT-SoVITS Original code: https://github.com/RVC-Boss/GPT-SoVITS

GPT-SoVITS is a powerful few-shot voice conversion and text-to-speech system by RVC-Boss. It achieves high-quality voice cloning with just 1 minute of training data, supporting zero-shot and few-shot TTS with cross-lingual synthesis capabilities.

Original Work

This model was created by the original authors. Please cite their work if you use this model:

@misc{RVCBoss2024,
  author = {RVC-Boss},
  title = {GPT-SoVITS: 1 min voice data can also be used to train a good TTS model},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/RVC-Boss/GPT-SoVITS}},
}

Installation

pip install ttsdb-gpt-sovits

Usage

from ttsdb_gpt_sovits import GPTSoVITS

# Load the model (downloads weights automatically)
model = GPTSoVITS(model_id="ttsds/gpt-sovits")

# Synthesize speech
audio, sample_rate = model.synthesize(
    text="Hello, this is a test of GPT-SoVITS.",
    reference_audio="path/to/reference.wav",
    text_reference="Transcript of the reference audio.",
    language="eng",
)

# Save the output
model.save_audio(audio, sample_rate, "output.wav")

Model Details

Property	Value
Sample Rate	32000 Hz
Parameters	167M
Architecture	Autoregressive, Non-Autoregressive, GPT, VITS
Languages	English, Chinese, Japanese
Release Date	2024-01-16

Training Data

Internal Dataset (2000 hours)

License

Weights: MIT License
Code: MIT License

Please refer to the original repositories for full license terms.

Model tree for ttsds/gpt-sovits

Base model

lj1995/GPT-SoVITS

Quantized

(3)

this model

ttsds
/

gpt-sovits