Configuration Parsing
Warning:
Invalid JSON for config file tokenizer_config.json
Overview
VieNeu-TTS is an advanced on-device Vietnamese Text-to-Speech (TTS) model with instant voice cloning.
Trained on ~1000 hours of high-quality Vietnamese speech, this model represents a significant upgrade from VieNeu-TTS-140h with the following improvements:
- Enhanced pronunciation: More accurate and stable Vietnamese pronunciation
- Code-switching support: Seamless transitions between Vietnamese and English
- Better voice cloning: Higher fidelity and speaker consistency
- Real-time synthesis: 24 kHz waveform generation on CPU or GPU
VieNeu-TTS-1000h delivers production-ready speech synthesis fully offline.
Author: Phạm Nguyễn Ngọc Bảo
Support This Project
Training high-quality TTS models requires significant GPU resources and compute time. If you find this model useful, please consider supporting the development:
Your support helps maintain and improve VieNeu-TTS! 🙏
Reference Voices
| File | Gender | Accent | Description |
|---|---|---|---|
| Bình (nam miền Bắc) | Male | North | Male voice, North accent |
| Tuyên (nam miền Bắc) | Male | North | Male voice, North accent |
| Nguyên (nam miền Nam) | Male | South | Male voice, South accent |
| Sơn (nam miền Nam) | Male | South | Male voice, South accent |
| Vĩnh (nam miền Nam) | Male | South | Male voice, South accent |
| Hương (nữ miền Bắc) | Female | North | Female voice, North accent |
| Ly (nữ miền Bắc) | Female | North | Female voice, North accent |
| Ngọc (nữ miền Bắc) | Female | North | Female voice, North accent |
| Đoan (nữ miền Nam) | Female | South | Female voice, South accent |
| Dung (nữ miền Nam) | Female | South | Female voice, South accent |
Model Architecture
| Component | Description |
|---|---|
| Backbone | Qwen 0.5B (chat-format LM) |
| Codec | NeuCodec (supports ONNX + quantization) |
| Output | 24 kHz waveform synthesis |
| Context Window | 2048 tokens shared text + speech |
| Watermark | Enabled |
| Training Data | VieNeuCodec-dataset + Emilia dataset pretraining |
Features
- High-quality Vietnamese speech
- Instant voice cloning (3–5 second reference audio)
- Fully offline
- Runs real-time or faster
- Multi-voice reference support
- Python API + CLI + Gradio
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
Missing libespeak |
System dependency | Install eSpeak NG |
| GPU OOM | VRAM too small | Use CPU or quantized model |
| Poor voice match | Bad reference sample | Try a clearer reference clip |
License
Apache 2.0
Citation
@misc{vieneutts2025,
title = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning},
author = {Pham Nguyen Ngoc Bao},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}}
}
Please also cite the base model:
@misc{neuttsair2025,
title = {NeuTTS Air: On-Device Speech Language Model with Instant Voice Cloning},
author = {Neuphonic},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/neuphonic/neutts-air}}
}
- Downloads last month
- 64
Model tree for thanhtantran/VieNeu-TTS
Base model
neuphonic/neutts-air