|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- pnnbao-ump/VieNeu-TTS-1000h |
|
|
- pnnbao-ump/VieNeuCodec-dataset |
|
|
- pnnbao-ump/VieNeu-TTS-140h |
|
|
language: |
|
|
- vi |
|
|
base_model: |
|
|
- neuphonic/neutts-air |
|
|
pipeline_tag: text-to-speech |
|
|
--- |
|
|
|
|
|
## Overview |
|
|
|
|
|
**VieNeu-TTS** is an advanced on-device Vietnamese Text-to-Speech (TTS) model with **instant voice cloning**. |
|
|
|
|
|
Trained on ~1000 hours of high-quality Vietnamese speech, this model represents a significant upgrade from VieNeu-TTS-140h with the following improvements: |
|
|
|
|
|
- **Enhanced pronunciation**: More accurate and stable Vietnamese pronunciation |
|
|
- **Code-switching support**: Seamless transitions between Vietnamese and English |
|
|
- **Better voice cloning**: Higher fidelity and speaker consistency |
|
|
- **Real-time synthesis**: 24 kHz waveform generation on CPU or GPU |
|
|
|
|
|
VieNeu-TTS-1000h delivers production-ready speech synthesis fully offline. |
|
|
|
|
|
**Author:** Phạm Nguyễn Ngọc Bảo |
|
|
|
|
|
## Support This Project |
|
|
|
|
|
Training high-quality TTS models requires significant GPU resources and compute time. If you find this model useful, please consider supporting the development: |
|
|
|
|
|
[](https://buymeacoffee.com/pnnbao) |
|
|
|
|
|
Your support helps maintain and improve VieNeu-TTS! 🙏 |
|
|
|
|
|
--- |
|
|
|
|
|
## Reference Voices |
|
|
|
|
|
| File | Gender | Accent | Description | |
|
|
|-------------------------|--------|--------|--------------------| |
|
|
| Bình (nam miền Bắc) | Male | North | Male voice, North accent | |
|
|
| Tuyên (nam miền Bắc) | Male | North | Male voice, North accent | |
|
|
| Nguyên (nam miền Nam) | Male | South | Male voice, South accent | |
|
|
| Sơn (nam miền Nam) | Male | South | Male voice, South accent | |
|
|
| Vĩnh (nam miền Nam) | Male | South | Male voice, South accent | |
|
|
| Hương (nữ miền Bắc) | Female | North | Female voice, North accent | |
|
|
| Ly (nữ miền Bắc) | Female | North | Female voice, North accent | |
|
|
| Ngọc (nữ miền Bắc) | Female | North | Female voice, North accent | |
|
|
| Đoan (nữ miền Nam) | Female | South | Female voice, South accent | |
|
|
| Dung (nữ miền Nam) | Female | South | Female voice, South accent | |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
| Component | Description | |
|
|
|----------|-------------| |
|
|
| Backbone | Qwen 0.5B (chat-format LM) | |
|
|
| Codec | NeuCodec (supports ONNX + quantization) | |
|
|
| Output | 24 kHz waveform synthesis | |
|
|
| Context Window | 2048 tokens shared text + speech | |
|
|
| Watermark | Enabled | |
|
|
| Training Data | VieNeuCodec-dataset + Emilia dataset pretraining | |
|
|
|
|
|
## Features |
|
|
|
|
|
- High-quality Vietnamese speech |
|
|
- Instant **voice cloning** (3–5 second reference audio) |
|
|
- Fully **offline** |
|
|
- Runs real-time or faster |
|
|
- Multi-voice reference support |
|
|
- Python API + CLI + Gradio |
|
|
|
|
|
## Troubleshooting |
|
|
|
|
|
| Issue | Cause | Solution | |
|
|
|------|-------|----------| |
|
|
| Missing `libespeak` | System dependency | Install eSpeak NG | |
|
|
| GPU OOM | VRAM too small | Use CPU or quantized model | |
|
|
| Poor voice match | Bad reference sample | Try a clearer reference clip | |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{vieneutts2025, |
|
|
title = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning}, |
|
|
author = {Pham Nguyen Ngoc Bao}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}} |
|
|
} |
|
|
``` |
|
|
|
|
|
Please also cite the base model: |
|
|
|
|
|
```bibtex |
|
|
@misc{neuttsair2025, |
|
|
title = {NeuTTS Air: On-Device Speech Language Model with Instant Voice Cloning}, |
|
|
author = {Neuphonic}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
howpublished = {\url{https://huggingface.co/neuphonic/neutts-air}} |
|
|
} |
|
|
``` |