--- language: si license: cc-by-nc-4.0 tags: - text-to-speech - tts - sinhala - f5-tts - voice-cloning datasets: - pathnirvana metrics: # Placeholder for where you might add evaluation metrics like PESQ or MOS score later # Leave as is if you don't have them yet --- ## ๐Ÿš€ Overview `tts-si-F5-TTS` is a state-of-the-art Text-to-Speech (TTS) model tailored for the **Sinhala (เทƒเท’เถ‚เท„เถฝ)** language. It is built upon the advanced **F5-TTS (Flow-Matching)** architecture, enabling high-quality, natural-sounding speech generation. This model is a significant resource for the Sinhala language community, supporting research, content creation, and accessibility initiatives. ## ๐Ÿ› ๏ธ Model Details | Attribute | Value | | :--- | :--- | | **Model ID** | `tharindumihi/tts-si-F5-TTS` | | **Architecture** | F5-TTS (Flow-Matching based Text-to-Speech) | | **Primary Language** | Sinhala (`si`) | | **Estimated Model Size** | approx 650 Million Parameters (Based on a 1.25GB checkpoint) | | **Inference Library** | F5-TTS Library | *** ## ๐Ÿ“Š Training Data The model was trained on a single-speaker, custom Sinhala dataset. | Attribute | Value | | :--- | :--- | | **Dataset Name** | **Pathnirvana** | | **Total Duration** | 07 hours 41 minutes 18 seconds | | **Total Utterances** | 3,300 files | | **Speaker Count** | 1 (Monolingual, single speaker) | *** ## ๐ŸŽฏ Intended Uses & Performance ### Primary Intended Uses * Research and development in Sinhala speech synthesis. * Generating voiceovers for **non-commercial** educational content, documentaries, and apps. * Accessibility tools for text-to-speech for Sinhala speakers. ### Performance * **Speech Quality:** The model produces **high-quality and natural-sounding Sinhala speech.** * **Voice Cloning:** It supports **Zero-Shot Voice Cloning** for Sinhala. Users should note that the voice similarity may be variable and not perfectly match the reference audio. *** ## ๐Ÿ’ป How to Use To use this model, you will need the official `f5-tts` Python package. # 1. Install the necessary libraries ```bash pip install f5-tts ``` # 2. clone the original repositary ```bash git clone https://github.com/JarodMica/F5-TTS.git cd F5-TTS python -m venv venv source venv/bin/activate pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118 pip install -e . pip install tensorboard ``` # 3. Download the model files # Use huggingface-cli or git clone to get the files (model.pt, vocab.txt) Alternative Method (Manual Download): Navigate to the "Files and versions" tab on the Hugging Face repo page and download model_230000_reduced.pt, and vocab.txt manually. Create a folder named **ckpts\f5-TTS** and place all files directly inside it. ### Note on Inference CLI **The official `f5-tts` inference CLI is currently known to throw errors.** Therefore, the recommended method for local testing and inference is to use the custom Gradio web interface provided with this model. ### Using the Custom Gradio UI for GUI Inference 4. **Download Custom UI File:** Download the provided `custom_gradio.py` and infer_cli_custom.py files from this repository. 5. **Place the Files:** You must place the `custom_gradio.py` and 'infer_cli_custom.py' file inside your local F5-TTS installation directory structure: `[Root_Directory]\F5-TTS\src\f5_tts\infer\` 6. **Run the UI:** Execute the Gradio application using the Python module runner from the F5-TTS root directory: ```bash python -m src\f5_tts\infer\custom_gradio.py ``` 7. Access: This command will start a local Gradio server. Open the displayed local URL [http://127.0.0.1:7860](http://127.0.0.1:7860) in your web browser to use the model via the graphical interface. ## ๐Ÿ™ Acknowledgements This project would not have been possible without the foundational work and the data provided by the following entities: * **F5-TTS Framework:** Deepest gratitude to the **F5-TTS developers** for creating the robust training and inference framework that was used to develop this Sinhala model. * **Pathnirvana Dataset:** We acknowledge the **Pathnirvana project/contributors** for providing the essential high-quality Sinhala speech data used to train this model. *** ## ๐Ÿ“œ Licensing and Terms of Use ### License This model is licensed under the **Creative Commons Attribution Non Commercial 4.0 International License (`cc-by-nc-4.0`)**. * **You are free to:** Share (copy and redistribute) and Adapt (remix, transform, and build upon) the material. * **Under the following terms:** * **Attribution (BY):** You must give appropriate credit. * **NonCommercial (NC):** You may **not** use the material for commercial purposes. Please review the full license terms by following the link in the license metadata at the top of this card. *** ## โš ๏ธ Ethical Considerations and Limitations All synthetic voice technology carries potential risks. Users must adhere to the model's license and ethical guidelines. ### Misuse Policy The use of this model to generate audio that impersonates, deceives, or violates the privacy or rights of individuals or groups is **strictly prohibited**. It must not be used for illegal or unethical activities, including creating malicious deepfakes or spam.