
π Model Checkpoints | π€ Gradio Demo | π Thonburian TTS Paper
Thonburian TTS
Thonburian TTS is a Thai Text-to-Speech (TTS) engine built on top of the F5-TTS.
It generates natural and expressive Thai speech by leveraging Flow-Matching diffusion techniques and can mimic reference voices from short audio samples. The system supports:
- Thai language generation (
language="th") - Reference-based voice cloning using short audio clips
- High-quality synthesis with controllable speed and silence trimming
Model Checkpoints
| Model Component | Description | URL |
|---|---|---|
| F5-TTS Thai | Flow Matching-based Thai TTS models | Link |
| F5-TTS IPA | Flow Matching-based Thai-IPA TTS models | Link |
Quick Usage
Installation
Install dependencies:
pip install torch cached-path librosa transformers f5-tts
sudo apt install ffmpeg
Clone GitHub
git clone https://github.com/biodatlab/thonburian-tts.git
cd thonburian-tts
Loading Thai Script based Models
from flowtts.inference import FlowTTSPipeline, ModelConfig, AudioConfig
import torch
# Configure F5-TTS model
model_config = ModelConfig(
language="th",
model_type="F5",
checkpoint="hf://biodatlab/ThonburianTTS/megaF5/mega_f5_last.safetensors",
vocab_file="hf://biodatlab/ThonburianTTS/megaF5/mega_vocab.txt",
vocoder="vocos",
device="cuda" if torch.cuda.is_available() else "cpu"
)
# Basic audio settings
audio_config = AudioConfig(
silence_threshold=-45,
cfg_strength=2.5,
speed=1.0
)
pipeline = FlowTTSPipeline(model_config, audio_config)
Loading IPA based Models
from flowtts.inference import FlowTTSPipeline, ModelConfig, AudioConfig
import torch
# Configure F5-TTS model
model_config = ModelConfig(
model_type="F5",
checkpoint="hf://biodatlab/ThonburianTTS/megaIPA/model_last_prune.safetensors",
vocab_file="hf://biodatlab/ThonburianTTS/megaIPA/mega_vocab_ipa.txt",
vocoder="vocos",
device="cuda" if torch.cuda.is_available() else "cpu"
)
# Basic audio settings
audio_config = AudioConfig(
silence_threshold=-45,
cfg_strength=2.5,
speed=1.0
)
pipeline = FlowTTSPipeline(model_config, audio_config)
Example Outputs
![]() π΅ Sample 1 β Single-speaker Thai Normal Text |
![]() π΅ Sample 2 β Single-Speaker Thai Code-mixed Text |
![]() π΅ Sample 3 β Multi-Speaker Conversational Speech |
Developers
Citation
If you use ThonburianTTS in your research, please cite:
Thura Aung, Panyut Sriwirote, Thanachot Thavornmongkol, Knot Pipatsrisawat, Titipat Achakulvisut, Zaw Htet Aung, "ThonburianTTS: Enhancing Neural Flow Matching Models for Authentic Thai Text-to-Speech", 2025 20th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), Phuket, Thailand, 2025, pp. 1-6,
License
The models are released under the Creative Commons Attribution Non-Commercial ShareAlike 4.0 License (CC BY-NC-SA 4.0).
Acknowledgement
We would like to acknowledge NSTDA Supercomputer Center (ThaiSC) project #pv824003 for providing computing resources for this work.
- Downloads last month
- 223
Model tree for biodatlab/ThonburianTTS
Base model
SWivid/F5-TTS

