---
language: si
license: cc-by-nc-4.0
tags:
- text-to-speech
- tts
- sinhala
- f5-tts
- voice-cloning
datasets:
- pathnirvana 
metrics:
# Placeholder for where you might add evaluation metrics like PESQ or MOS score later
# Leave as is if you don't have them yet
---

## 🚀 Overview

`tts-si-F5-TTS` is a state-of-the-art Text-to-Speech (TTS) model tailored for the **Sinhala (සිංහල)** language. It is built upon the advanced **F5-TTS (Flow-Matching)** architecture, enabling high-quality, natural-sounding speech generation.

This model is a significant resource for the Sinhala language community, supporting research, content creation, and accessibility initiatives.

## 🛠️ Model Details

| Attribute | Value |
| :--- | :--- |
| **Model ID** | `tharindumihi/tts-si-F5-TTS` |
| **Architecture** | F5-TTS (Flow-Matching based Text-to-Speech) |
| **Primary Language** | Sinhala (`si`) |
| **Estimated Model Size** | approx 650 Million Parameters (Based on a 1.25GB checkpoint) |
| **Inference Library** | F5-TTS Library |
***

## 📊 Training Data

The model was trained on a single-speaker, custom Sinhala dataset.

| Attribute | Value |
| :--- | :--- |
| **Dataset Name** | **Pathnirvana** |
| **Total Duration** | 07 hours 41 minutes 18 seconds |
| **Total Utterances** | 3,300 files |
| **Speaker Count** | 1 (Monolingual, single speaker) |
***

## 🎯 Intended Uses & Performance

### Primary Intended Uses
* Research and development in Sinhala speech synthesis.
* Generating voiceovers for **non-commercial** educational content, documentaries, and apps.
* Accessibility tools for text-to-speech for Sinhala speakers.

### Performance
* **Speech Quality:** The model produces **high-quality and natural-sounding Sinhala speech.**
* **Voice Cloning:** It supports **Zero-Shot Voice Cloning** for Sinhala. Users should note that the voice similarity may be variable and not perfectly match the reference audio.
***

## 💻 How to Use

To use this model, you will need the official `f5-tts` Python package.


# 1. Install the necessary libraries
```bash
pip install f5-tts
```
# 2. clone the original repositary 
```bash
git clone https://github.com/JarodMica/F5-TTS.git
cd F5-TTS

python -m venv venv
source venv/bin/activate

pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118
pip install -e .
pip install tensorboard
```
# 3. Download the model files
# Use huggingface-cli or git clone to get the files (model.pt, vocab.txt)
Alternative Method (Manual Download):
Navigate to the "Files and versions" tab on the Hugging Face repo page and download model_230000_reduced.pt, and vocab.txt manually. Create a folder named **ckpts\f5-TTS** and place all files directly inside it.

### Note on Inference CLI
**The official `f5-tts` inference CLI is currently known to throw errors.** Therefore, the recommended method for local testing and inference is to use the custom Gradio web interface provided with this model.

### Using the Custom Gradio UI for GUI Inference

4. **Download Custom UI File:** Download the provided `custom_gradio.py` and infer_cli_custom.py files from this repository.
5. **Place the Files:** You must place the `custom_gradio.py` and 'infer_cli_custom.py' file inside your local F5-TTS installation directory structure:
    `[Root_Directory]\F5-TTS\src\f5_tts\infer\`
6. **Run the UI:** Execute the Gradio application using the Python module runner from the F5-TTS root directory:

```bash
python -m src\f5_tts\infer\custom_gradio.py
```

7. Access: This command will start a local Gradio server. 
Open the displayed local URL [http://127.0.0.1:7860](http://127.0.0.1:7860) in your web browser to use the model via the graphical interface.

## 🙏 Acknowledgements

This project would not have been possible without the foundational work and the data provided by the following entities:

* **F5-TTS Framework:** Deepest gratitude to the **F5-TTS developers** for creating the robust training and inference framework that was used to develop this Sinhala model.
* **Pathnirvana Dataset:** We acknowledge the **Pathnirvana project/contributors** for providing the essential high-quality Sinhala speech data used to train this model.
***
## 📜 Licensing and Terms of Use

### License

This model is licensed under the **Creative Commons Attribution Non Commercial 4.0 International License (`cc-by-nc-4.0`)**.

* **You are free to:** Share (copy and redistribute) and Adapt (remix, transform, and build upon) the material.
* **Under the following terms:**
    * **Attribution (BY):** You must give appropriate credit.
    * **NonCommercial (NC):** You may **not** use the material for commercial purposes.

Please review the full license terms by following the link in the license metadata at the top of this card.
***

## ⚠️ Ethical Considerations and Limitations

All synthetic voice technology carries potential risks. Users must adhere to the model's license and ethical guidelines.

### Misuse Policy

The use of this model to generate audio that impersonates, deceives, or violates the privacy or rights of individuals or groups is **strictly prohibited**. It must not be used for illegal or unethical activities, including creating malicious deepfakes or spam.