YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

VieNeu-TTS

VieNeu-TTS is an advanced on-device Vietnamese Text-to-Speech (TTS) model with instant voice cloning.

Voice Cloning: All model variants (including GGUF) support instant voice cloning with just 3-5 seconds of reference audio.

This project features two core architectures trained on the VieNeu-TTS-1000h dataset:

VieNeu-TTS (0.5B): An enhanced model fine-tuned from the NeuTTS Air architecture for maximum stability.
VieNeu-TTS-0.3B: A specialized model trained from scratch, delivering 2x faster inference and ultra-low latency.

These represent a significant upgrade from the previous VieNeu-TTS-140h with the following improvements:

Enhanced pronunciation: More accurate and stable Vietnamese pronunciation
Code-switching support: Seamless transitions between Vietnamese and English
Better voice cloning: Higher fidelity and speaker consistency
Real-time synthesis: 24 kHz waveform generation on CPU or GPU
Multiple model formats: Support for PyTorch, GGUF Q4/Q8 (CPU optimized), and ONNX codec

VieNeu-TTS delivers production-ready speech synthesis fully offline.

Author: Phạm Nguyễn Ngọc Bảo

🔬 Model Overview

Backbone:
- VieNeu-TTS (0.5B): Qwen-0.5B fine-tuned from NeuTTS Air.
- VieNeu-TTS-0.3B: Custom 0.3B model trained from scratch, optimized for extreme speed (2x faster).
Audio codec: NeuCodec (torch implementation; ONNX & quantized variants supported)
Context window: 2,048 tokens shared by prompt text and speech tokens
Output watermark: Enabled by default
Training data: VieNeu-TTS-1000h — 443,641 curated Vietnamese samples (Used for both versions).

Model Variants

Model	Format	Device	Quality	Speed
VieNeu-TTS	PyTorch	GPU/CPU	⭐⭐⭐⭐⭐	Very Fast with lmdeploy
VieNeu-TTS-0.3B	PyTorch	GPU/CPU	⭐⭐⭐⭐	Ultra Fast (2x)
VieNeu-TTS-q8-gguf	GGUF Q8	CPU/GPU	⭐⭐⭐⭐	Fast
VieNeu-TTS-q4-gguf	GGUF Q4	CPU/GPU	⭐⭐⭐	Very Fast
VieNeu-TTS-0.3B-q8-gguf	GGUF Q8	CPU/GPU	⭐⭐⭐⭐	Ultra Fast (1.5x)
VieNeu-TTS-0.3B-q4-gguf	GGUF Q4	CPU/GPU	⭐⭐⭐	Extreme Speed (2x)

Recommendations:

GPU users: Use VieNeu-TTS (PyTorch) for best quality
CPU users: Use VieNeu-TTS-0.3B-q4-gguf for fastest inference or VieNeu-TTS-0.3B-q8-gguf for best CPU quality.
Streaming: Only GGUF models support streaming inference (Requires llama-cpp-python >= 0.3.16)

✅ Todo & Status

Publish safetensor artifacts
Release GGUF Q4 / Q8 models
Release datasets (1000h and 140h)
Enable streaming on GPU
Provide Dockerized setup
Release fine-tuning code

🏁 Getting Started

1. Clone the repository

git clone https://github.com/pnnbao97/VieNeu-TTS.git
cd VieNeu-TTS

2. Install eSpeak NG (Required)

Phonemizer requires eSpeak NG to function.

Windows: Download installer from eSpeak NG Releases (Recommended: .msi).
macOS: brew install espeak
Ubuntu/Debian: sudo apt install espeak-ng
Arch Linux: paru -S aur/espeak-ng

3. Environment Setup (Choose ONE method)

Method 1: Standard with `uv` (Recommended)

This is the fastest and most reliable way to manage dependencies.

A. Install uv (If you haven't already):

Windows: powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
Linux/macOS: curl -LsSf https://astral.sh/uv/install.sh | sh

B. Choose your hardware:

Option A: For GPU Users (NVIDIA 30xx/40xx/50xx)

Update your NVIDIA Drivers & Install CUDA Toolkit! This project uses CUDA 12.8. Please ensure your NVIDIA driver is up-to-date (support CUDA 12.8 or newer) to avoid compatibility issues, especially on RTX 30 series.

To use lmdeploy, you MUST install the NVIDIA GPU Computing Toolkit: https://developer.nvidia.com/cuda-downloads.

uv sync

Option B: For CPU-only Users

Switch to CPU configuration:

# Windows:
ren pyproject.toml pyproject.toml.bak
copy pyproject.toml.cpu pyproject.toml

# Linux/macOS:
mv pyproject.toml pyproject.toml.bak
cp pyproject.toml.cpu pyproject.toml

Install dependencies:
```
uv sync
```

C. Run the Application:

uv run gradio_app.py

Then access the Web UI at http://127.0.0.1:7860.

Method 2: Automatic with Makefile (Alternative)

Best if you have make installed (standard on Linux/macOS, or via Git Bash on Windows). It handles configuration swaps automatically.

Setup GPU: make setup-gpu
Setup CPU: make setup-cpu
Run Demo: make demo

Then access the Web UI at http://127.0.0.1:7860.

🐋 Docker Deployment

For a quick start or production deployment without manually installing dependencies, use Docker.

Quick Start

Copy .env.example to .env

cp .env.example .env

Build and start container

# Run with CPU
docker compose --profile cpu up

# Run with GPU (requires NVIDIA Container Toolkit)
docker compose --profile gpu up

Access the Web UI at http://localhost:7860.

For detailed deployment instructions, including production setup, see docs/Deploy.md.

📦 Project Structure

VieNeu-TTS/
├── examples/
│   ├── infer_long_text.py     # CLI for long-form synthesis (chunked)
│   └── sample_long_text.txt   # Example paragraph for testing
├── gradio_app.py              # Local Gradio web demo with LMDeploy support
├── main.py                    # Basic batch inference script
├── config.yaml                # Configuration for models, codecs, and voices
├── output_audio/              # Generated audio (created when running scripts)
├── sample/                    # Reference voices (audio + transcript + codes)
│   ├── Bình (nam miền Bắc).wav/txt/pt
│   ├── Đoan (nữ miền Nam).wav/txt/pt
│   ├── Dung (nữ miền Nam).wav/txt/pt
│   ├── Hương (nữ miền Bắc).wav/txt/pt
│   ├── Ly (nữ miền Bắc).wav/txt/pt
│   ├── Ngọc (nữ miền Bắc).wav/txt/pt
│   ├── Nguyên (nam miền Nam).wav/txt/pt
│   ├── Sơn (nam miền Nam).wav/txt/pt
│   ├── Tuyên (nam miền Bắc).wav/txt/pt
│   └── Vĩnh (nam miền Nam).wav/txt/pt
├── utils/
│   ├── __init__.py
│   ├── core_utils.py          # Text chunking utilities
│   ├── normalize_text.py      # Vietnamese text normalization pipeline
│   ├── phonemize_text.py      # Text to phoneme conversion
│   └── phoneme_dict.json      # Phoneme dictionary
├── vieneu_tts/
│   ├── __init__.py            # Exports VieNeuTTS and FastVieNeuTTS
│   └── vieneu_tts.py          # Core VieNeuTTS implementation (VieNeuTTS & FastVieNeuTTS)
├── README.md
├── requirements.txt           # Basic dependencies (legacy)
├── pyproject.toml             # Project configuration with full dependencies (UV)
└── uv.lock                    # UV lock file for dependency management

📚 References

📄 License

VieNeu-TTS (0.5B): Original terms (Apache 2.0).
VieNeu-TTS-0.3B: Released under CC BY-NC 4.0 (Non-Commercial).
- This version is currently experimental.
- Commercial use is prohibited without authorization. Please contact the author for commercial licensing.

📑 Citation

@misc{vieneutts2026,
  title        = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning},
  author       = {Pham Nguyen Ngoc Bao},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}}
}

🤝 Contributing

Contributions are welcome!

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Commit your changes: git commit -m "Add amazing feature"
Push the branch: git push origin feature/amazing-feature
Open a pull request

📞 Support

GitHub Issues: github.com/pnnbao97/VieNeu-TTS/issues
Hugging Face: huggingface.co/pnnbao-ump
Discord: Join with us
Facebook: Phạm Nguyễn Ngọc Bảo

🙏 Acknowledgements

This project builds upon NeuTTS Air for the original 0.5B model. The 0.3B version is a custom architecture trained from scratch using the VieNeu-TTS-1000h dataset.

Made with ❤️ for the Vietnamese TTS community

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

VieNeu-TTS

🔬 Model Overview

Model Variants

✅ Todo & Status

🏁 Getting Started

1. Clone the repository

2. Install eSpeak NG (Required)

3. Environment Setup (Choose ONE method)

Method 1: Standard with uv (Recommended)

Method 2: Automatic with Makefile (Alternative)

🐋 Docker Deployment

Quick Start

📦 Project Structure

📚 References

📄 License

📑 Citation

🤝 Contributing

📞 Support

🙏 Acknowledgements

Method 1: Standard with `uv` (Recommended)