--- license: apache-2.0 datasets: - pnnbao-ump/VieNeu-TTS-1000h - pnnbao-ump/VieNeuCodec-dataset - pnnbao-ump/VieNeu-TTS-140h language: - vi base_model: - neuphonic/neutts-air pipeline_tag: text-to-speech --- ## Overview **VieNeu-TTS** is an advanced on-device Vietnamese Text-to-Speech (TTS) model with **instant voice cloning**. Trained on ~1000 hours of high-quality Vietnamese speech, this model represents a significant upgrade from VieNeu-TTS-140h with the following improvements: - **Enhanced pronunciation**: More accurate and stable Vietnamese pronunciation - **Code-switching support**: Seamless transitions between Vietnamese and English - **Better voice cloning**: Higher fidelity and speaker consistency - **Real-time synthesis**: 24 kHz waveform generation on CPU or GPU VieNeu-TTS-1000h delivers production-ready speech synthesis fully offline. **Author:** Phạm Nguyễn Ngọc Bảo ## Support This Project Training high-quality TTS models requires significant GPU resources and compute time. If you find this model useful, please consider supporting the development: [![Buy Me a Coffee](https://img.shields.io/badge/Buy%20Me%20a%20Coffee-Support-orange?logo=buy-me-a-coffee)](https://buymeacoffee.com/pnnbao) Your support helps maintain and improve VieNeu-TTS! 🙏 --- ## Reference Voices | File | Gender | Accent | Description | |-------------------------|--------|--------|--------------------| | Bình (nam miền Bắc) | Male | North | Male voice, North accent | | Tuyên (nam miền Bắc) | Male | North | Male voice, North accent | | Nguyên (nam miền Nam) | Male | South | Male voice, South accent | | Sơn (nam miền Nam) | Male | South | Male voice, South accent | | Vĩnh (nam miền Nam) | Male | South | Male voice, South accent | | Hương (nữ miền Bắc) | Female | North | Female voice, North accent | | Ly (nữ miền Bắc) | Female | North | Female voice, North accent | | Ngọc (nữ miền Bắc) | Female | North | Female voice, North accent | | Đoan (nữ miền Nam) | Female | South | Female voice, South accent | | Dung (nữ miền Nam) | Female | South | Female voice, South accent | --- ## Model Architecture | Component | Description | |----------|-------------| | Backbone | Qwen 0.5B (chat-format LM) | | Codec | NeuCodec (supports ONNX + quantization) | | Output | 24 kHz waveform synthesis | | Context Window | 2048 tokens shared text + speech | | Watermark | Enabled | | Training Data | VieNeuCodec-dataset + Emilia dataset pretraining | ## Features - High-quality Vietnamese speech - Instant **voice cloning** (3–5 second reference audio) - Fully **offline** - Runs real-time or faster - Multi-voice reference support - Python API + CLI + Gradio ## Troubleshooting | Issue | Cause | Solution | |------|-------|----------| | Missing `libespeak` | System dependency | Install eSpeak NG | | GPU OOM | VRAM too small | Use CPU or quantized model | | Poor voice match | Bad reference sample | Try a clearer reference clip | ## License Apache 2.0 ## Citation ```bibtex @misc{vieneutts2025, title = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning}, author = {Pham Nguyen Ngoc Bao}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}} } ``` Please also cite the base model: ```bibtex @misc{neuttsair2025, title = {NeuTTS Air: On-Device Speech Language Model with Instant Voice Cloning}, author = {Neuphonic}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/neuphonic/neutts-air}} } ```