Text-to-Speech
Safetensors
GGUF
Vietnamese
qwen2
VieNeu-TTS / README.md
thanhtantran's picture
Update README.md
3d3eac0 verified
---
license: apache-2.0
datasets:
- pnnbao-ump/VieNeu-TTS-1000h
- pnnbao-ump/VieNeuCodec-dataset
- pnnbao-ump/VieNeu-TTS-140h
language:
- vi
base_model:
- neuphonic/neutts-air
pipeline_tag: text-to-speech
---
## Overview
**VieNeu-TTS** is an advanced on-device Vietnamese Text-to-Speech (TTS) model with **instant voice cloning**.
Trained on ~1000 hours of high-quality Vietnamese speech, this model represents a significant upgrade from VieNeu-TTS-140h with the following improvements:
- **Enhanced pronunciation**: More accurate and stable Vietnamese pronunciation
- **Code-switching support**: Seamless transitions between Vietnamese and English
- **Better voice cloning**: Higher fidelity and speaker consistency
- **Real-time synthesis**: 24 kHz waveform generation on CPU or GPU
VieNeu-TTS-1000h delivers production-ready speech synthesis fully offline.
**Author:** Phạm Nguyễn Ngọc Bảo
## Support This Project
Training high-quality TTS models requires significant GPU resources and compute time. If you find this model useful, please consider supporting the development:
[![Buy Me a Coffee](https://img.shields.io/badge/Buy%20Me%20a%20Coffee-Support-orange?logo=buy-me-a-coffee)](https://buymeacoffee.com/pnnbao)
Your support helps maintain and improve VieNeu-TTS! 🙏
---
## Reference Voices
| File | Gender | Accent | Description |
|-------------------------|--------|--------|--------------------|
| Bình (nam miền Bắc) | Male | North | Male voice, North accent |
| Tuyên (nam miền Bắc) | Male | North | Male voice, North accent |
| Nguyên (nam miền Nam) | Male | South | Male voice, South accent |
| Sơn (nam miền Nam) | Male | South | Male voice, South accent |
| Vĩnh (nam miền Nam) | Male | South | Male voice, South accent |
| Hương (nữ miền Bắc) | Female | North | Female voice, North accent |
| Ly (nữ miền Bắc) | Female | North | Female voice, North accent |
| Ngọc (nữ miền Bắc) | Female | North | Female voice, North accent |
| Đoan (nữ miền Nam) | Female | South | Female voice, South accent |
| Dung (nữ miền Nam) | Female | South | Female voice, South accent |
---
## Model Architecture
| Component | Description |
|----------|-------------|
| Backbone | Qwen 0.5B (chat-format LM) |
| Codec | NeuCodec (supports ONNX + quantization) |
| Output | 24 kHz waveform synthesis |
| Context Window | 2048 tokens shared text + speech |
| Watermark | Enabled |
| Training Data | VieNeuCodec-dataset + Emilia dataset pretraining |
## Features
- High-quality Vietnamese speech
- Instant **voice cloning** (3–5 second reference audio)
- Fully **offline**
- Runs real-time or faster
- Multi-voice reference support
- Python API + CLI + Gradio
## Troubleshooting
| Issue | Cause | Solution |
|------|-------|----------|
| Missing `libespeak` | System dependency | Install eSpeak NG |
| GPU OOM | VRAM too small | Use CPU or quantized model |
| Poor voice match | Bad reference sample | Try a clearer reference clip |
## License
Apache 2.0
## Citation
```bibtex
@misc{vieneutts2025,
title = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning},
author = {Pham Nguyen Ngoc Bao},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}}
}
```
Please also cite the base model:
```bibtex
@misc{neuttsair2025,
title = {NeuTTS Air: On-Device Speech Language Model with Instant Voice Cloning},
author = {Neuphonic},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/neuphonic/neutts-air}}
}
```