--- license: cc datasets: - speechcolab/gigaspeech language: - th base_model: - SWivid/F5-TTS pipeline_tag: text-to-speech tags: - flow-matching - f5-tts - thai - finetuning ---

[🔊 Model Checkpoints](https://huggingface.co/biodatlab/ThonburianTTS) | [🤗 Gradio Demo](https://github.com/biodatlab/thonburian-tts/blob/main/gradio_app.py) | [📄 ThonburianTTS Paper](https://ieeexplore.ieee.org/document/11320472) | [Colab Notebook](https://colab.research.google.com/drive/1vIwNMjsyILluNT0l7I8KduS7S2Bhj9ra?usp=sharing) | [GitHub](https://github.com/biodatlab/thonburian-tts) ## **Thonburian TTS** **Thonburian TTS** is a **Thai Text-to-Speech (TTS)** engine built on top of the [F5-TTS](https://github.com/SWivid/F5-TTS). It generates **natural and expressive Thai speech** by leveraging **Flow-Matching diffusion techniques** and can **mimic reference voices** from short audio samples. The system supports: - **Thai language generation** (`language="th"`) - **Reference-based voice cloning** using short audio clips - High-quality synthesis with controllable speed and silence trimming ## **Model Checkpoints** | Model Component | Description | URL | | ---------------------- | ---------------------------------- | ---------------------------------------------------------------------------- | | **F5-TTS Thai** | Flow Matching-based Thai TTS models | [Link](https://huggingface.co/biodatlab/ThonburianTTS/tree/main/megaF5) | | **F5-TTS IPA** | Flow Matching-based Thai-IPA TTS models | [Link](https://huggingface.co/biodatlab/ThonburianTTS/tree/main/megaIPA) | ## **Quick Usage** ### **Installation** Install dependencies: ```bash pip install torch cached-path librosa transformers f5-tts sudo apt install ffmpeg ``` ### **Clone GitHub** ``` git clone https://github.com/biodatlab/thonburian-tts.git cd thonburian-tts ``` #### **Loading Thai Script based Models** ```py from flowtts.inference import FlowTTSPipeline, ModelConfig, AudioConfig import torch # Configure F5-TTS model model_config = ModelConfig( language="th", model_type="F5", checkpoint="hf://biodatlab/ThonburianTTS/megaF5/mega_f5_last.safetensors", vocab_file="hf://biodatlab/ThonburianTTS/megaF5/mega_vocab.txt", vocoder="vocos", device="cuda" if torch.cuda.is_available() else "cpu" ) # Basic audio settings audio_config = AudioConfig( silence_threshold=-45, cfg_strength=2.5, speed=1.0 ) pipeline = FlowTTSPipeline(model_config, audio_config) ``` #### **Loading IPA based Models** ```py from flowtts.inference import FlowTTSPipeline, ModelConfig, AudioConfig import torch # Configure F5-TTS model model_config = ModelConfig( model_type="F5", checkpoint="hf://biodatlab/ThonburianTTS/megaIPA/model_last_prune.safetensors", vocab_file="hf://biodatlab/ThonburianTTS/megaIPA/mega_vocab_ipa.txt", vocoder="vocos", device="cuda" if torch.cuda.is_available() else "cpu" ) # Basic audio settings audio_config = AudioConfig( silence_threshold=-45, cfg_strength=2.5, speed=1.0 ) pipeline = FlowTTSPipeline(model_config, audio_config) ``` ## **Example Outputs**

🎵 Sample 1 – Single-speaker Thai Normal Text

🎵 Sample 2 – Single-Speaker Thai Code-mixed Text

🎵 Sample 3 – Multi-Speaker Conversational Speech

--- ## **Developers** - [Looloo Technology](https://loolootech.com/) - [Biomedical and Data Lab, Mahidol University](https://biodatlab.github.io/)

## **Citation** If you use **ThonburianTTS** in your research, please cite: ``` @INPROCEEDINGS{11320472, author={Aung, Thura and Sriwirote, Panyut and Thavornmongkol, Thanachot and Pipatsrisawat, Knot and Achakulvisut, Titipat and Aung, Zaw Htet}, booktitle={2025 20th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)}, title={ThonburianTTS: Enhancing Neural Flow Matching Models for Authentic Thai Text-to-Speech}, year={2025}, volume={}, number={}, pages={1-6}, keywords={Adaptation models;Codes;Accuracy;Error analysis;Phonetics;Robustness;Natural language processing;Text to speech;Noise measurement;Research and development;Thai text-to-speech;Flow matching;F5-TTS}, doi={10.1109/iSAI-NLP66160.2025.11320472}} ``` ``` Thura Aung, Panyut Sriwirote, Thanachot Thavornmongkol, Knot Pipatsrisawat, Titipat Achakulvisut, Zaw Htet Aung, "ThonburianTTS: Enhancing Neural Flow Matching Models for Authentic Thai Text-to-Speech", 2025 20th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), Phuket, Thailand, 2025, pp. 1-6, doi: 10.1109/iSAI-NLP66160.2025.11320472. ``` ## **License** The **models** are released under the [Creative Commons Attribution Non-Commercial ShareAlike 4.0 License (CC BY-NC-SA 4.0)](LICENSE-CC-BY-NC-SA). ## Acknowledgement We would like to acknowledge NSTDA Supercomputer Center (ThaiSC) project \#pv824003 for providing computing resources for this work.