---
license: cc
datasets:
- speechcolab/gigaspeech
language:
- th
base_model:
- SWivid/F5-TTS
pipeline_tag: text-to-speech
tags:
- flow-matching
- f5-tts
- thai
- finetuning
---

[🔊 Model Checkpoints](https://huggingface.co/biodatlab/ThonburianTTS) | [🤗 Gradio Demo](https://github.com/biodatlab/thonburian-tts/blob/main/gradio_app.py) | [📄 ThonburianTTS Paper](https://ieeexplore.ieee.org/document/11320472) | [Colab Notebook](https://colab.research.google.com/drive/1vIwNMjsyILluNT0l7I8KduS7S2Bhj9ra?usp=sharing) | [GitHub](https://github.com/biodatlab/thonburian-tts)
## **Thonburian TTS**
**Thonburian TTS** is a **Thai Text-to-Speech (TTS)** engine built on top of the [F5-TTS](https://github.com/SWivid/F5-TTS).
It generates **natural and expressive Thai speech** by leveraging **Flow-Matching diffusion techniques** and can **mimic reference voices** from short audio samples. The system supports:
- **Thai language generation** (`language="th"`)
- **Reference-based voice cloning** using short audio clips
- High-quality synthesis with controllable speed and silence trimming
## **Model Checkpoints**
| Model Component | Description | URL |
| ---------------------- | ---------------------------------- | ---------------------------------------------------------------------------- |
| **F5-TTS Thai** | Flow Matching-based Thai TTS models | [Link](https://huggingface.co/biodatlab/ThonburianTTS/tree/main/megaF5) |
| **F5-TTS IPA** | Flow Matching-based Thai-IPA TTS models | [Link](https://huggingface.co/biodatlab/ThonburianTTS/tree/main/megaIPA) |
## **Quick Usage**
### **Installation**
Install dependencies:
```bash
pip install torch cached-path librosa transformers f5-tts
sudo apt install ffmpeg
```
### **Clone GitHub**
```
git clone https://github.com/biodatlab/thonburian-tts.git
cd thonburian-tts
```
#### **Loading Thai Script based Models**
```py
from flowtts.inference import FlowTTSPipeline, ModelConfig, AudioConfig
import torch
# Configure F5-TTS model
model_config = ModelConfig(
language="th",
model_type="F5",
checkpoint="hf://biodatlab/ThonburianTTS/megaF5/mega_f5_last.safetensors",
vocab_file="hf://biodatlab/ThonburianTTS/megaF5/mega_vocab.txt",
vocoder="vocos",
device="cuda" if torch.cuda.is_available() else "cpu"
)
# Basic audio settings
audio_config = AudioConfig(
silence_threshold=-45,
cfg_strength=2.5,
speed=1.0
)
pipeline = FlowTTSPipeline(model_config, audio_config)
```
#### **Loading IPA based Models**
```py
from flowtts.inference import FlowTTSPipeline, ModelConfig, AudioConfig
import torch
# Configure F5-TTS model
model_config = ModelConfig(
model_type="F5",
checkpoint="hf://biodatlab/ThonburianTTS/megaIPA/model_last_prune.safetensors",
vocab_file="hf://biodatlab/ThonburianTTS/megaIPA/mega_vocab_ipa.txt",
vocoder="vocos",
device="cuda" if torch.cuda.is_available() else "cpu"
)
# Basic audio settings
audio_config = AudioConfig(
silence_threshold=-45,
cfg_strength=2.5,
speed=1.0
)
pipeline = FlowTTSPipeline(model_config, audio_config)
```
## **Example Outputs**
---
## **Developers**
- [Looloo Technology](https://loolootech.com/)
- [Biomedical and Data Lab, Mahidol University](https://biodatlab.github.io/)
## **Citation**
If you use **ThonburianTTS** in your research, please cite:
```
@INPROCEEDINGS{11320472,
author={Aung, Thura and Sriwirote, Panyut and Thavornmongkol, Thanachot and Pipatsrisawat, Knot and Achakulvisut, Titipat and Aung, Zaw Htet},
booktitle={2025 20th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)},
title={ThonburianTTS: Enhancing Neural Flow Matching Models for Authentic Thai Text-to-Speech},
year={2025},
volume={},
number={},
pages={1-6},
keywords={Adaptation models;Codes;Accuracy;Error analysis;Phonetics;Robustness;Natural language processing;Text to speech;Noise measurement;Research and development;Thai text-to-speech;Flow matching;F5-TTS},
doi={10.1109/iSAI-NLP66160.2025.11320472}}
```
```
Thura Aung, Panyut Sriwirote, Thanachot Thavornmongkol, Knot Pipatsrisawat, Titipat Achakulvisut, Zaw Htet Aung, "ThonburianTTS: Enhancing Neural Flow Matching Models for Authentic Thai Text-to-Speech", 2025 20th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), Phuket, Thailand, 2025, pp. 1-6, doi: 10.1109/iSAI-NLP66160.2025.11320472.
```
## **License**
The **models** are released under the [Creative Commons Attribution Non-Commercial ShareAlike 4.0 License (CC BY-NC-SA 4.0)](LICENSE-CC-BY-NC-SA).
## Acknowledgement
We would like to acknowledge NSTDA Supercomputer Center (ThaiSC) project \#pv824003 for providing computing resources for this work.