---
license: apache-2.0
datasets:
- pnnbao-ump/VieNeu-TTS-1000h
- pnnbao-ump/VieNeuCodec-dataset
- pnnbao-ump/VieNeu-TTS-140h
language:
- vi
base_model:
- neuphonic/neutts-air
pipeline_tag: text-to-speech
---

## Overview

**VieNeu-TTS** is an advanced on-device Vietnamese Text-to-Speech (TTS) model with **instant voice cloning**.  

Trained on ~1000 hours of high-quality Vietnamese speech, this model represents a significant upgrade from VieNeu-TTS-140h with the following improvements:

- **Enhanced pronunciation**: More accurate and stable Vietnamese pronunciation
- **Code-switching support**: Seamless transitions between Vietnamese and English
- **Better voice cloning**: Higher fidelity and speaker consistency
- **Real-time synthesis**: 24 kHz waveform generation on CPU or GPU

VieNeu-TTS-1000h delivers production-ready speech synthesis fully offline.

**Author:** Phạm Nguyễn Ngọc Bảo

## Support This Project

Training high-quality TTS models requires significant GPU resources and compute time. If you find this model useful, please consider supporting the development:

[![Buy Me a Coffee](https://img.shields.io/badge/Buy%20Me%20a%20Coffee-Support-orange?logo=buy-me-a-coffee)](https://buymeacoffee.com/pnnbao)

Your support helps maintain and improve VieNeu-TTS! 🙏

---

## Reference Voices

| File                    | Gender | Accent | Description        |
|-------------------------|--------|--------|--------------------|
| Bình (nam miền Bắc)     | Male   | North  | Male voice, North accent |
| Tuyên (nam miền Bắc)    | Male   | North  | Male voice, North accent |
| Nguyên (nam miền Nam)   | Male   | South  | Male voice, South accent |
| Sơn (nam miền Nam)      | Male   | South  | Male voice, South accent |
| Vĩnh (nam miền Nam)     | Male   | South  | Male voice, South accent |
| Hương (nữ miền Bắc)     | Female | North  | Female voice, North accent |
| Ly (nữ miền Bắc)        | Female | North  | Female voice, North accent |
| Ngọc (nữ miền Bắc)      | Female | North  | Female voice, North accent |
| Đoan (nữ miền Nam)      | Female | South  | Female voice, South accent |
| Dung (nữ miền Nam)      | Female | South  | Female voice, South accent |

---

## Model Architecture

| Component | Description |
|----------|-------------|
| Backbone | Qwen 0.5B (chat-format LM) |
| Codec    | NeuCodec (supports ONNX + quantization) |
| Output   | 24 kHz waveform synthesis |
| Context Window | 2048 tokens shared text + speech |
| Watermark | Enabled |
| Training Data | VieNeuCodec-dataset + Emilia dataset pretraining |

## Features

- High-quality Vietnamese speech
- Instant **voice cloning** (3–5 second reference audio)
- Fully **offline**
- Runs real-time or faster
- Multi-voice reference support
- Python API + CLI + Gradio

## Troubleshooting

| Issue | Cause | Solution |
|------|-------|----------|
| Missing `libespeak` | System dependency | Install eSpeak NG |
| GPU OOM | VRAM too small | Use CPU or quantized model |
| Poor voice match | Bad reference sample | Try a clearer reference clip |

## License

Apache 2.0

## Citation

```bibtex
@misc{vieneutts2025,
  title        = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning},
  author       = {Pham Nguyen Ngoc Bao},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}}
}
```

Please also cite the base model:

```bibtex
@misc{neuttsair2025,
  title        = {NeuTTS Air: On-Device Speech Language Model with Instant Voice Cloning},
  author       = {Neuphonic},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/neuphonic/neutts-air}}
}
```