YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

VieNeu-TTS

GitHub Hugging Face Hugging Face Hugging Face Discord Open In Colab

Untitled

VieNeu-TTS is an advanced on-device Vietnamese Text-to-Speech (TTS) model with instant voice cloning.

Voice Cloning: All model variants (including GGUF) support instant voice cloning with just 3-5 seconds of reference audio.

This project features two core architectures trained on the VieNeu-TTS-1000h dataset:

  • VieNeu-TTS (0.5B): An enhanced model fine-tuned from the NeuTTS Air architecture for maximum stability.
  • VieNeu-TTS-0.3B: A specialized model trained from scratch, delivering 2x faster inference and ultra-low latency.

These represent a significant upgrade from the previous VieNeu-TTS-140h with the following improvements:

  • Enhanced pronunciation: More accurate and stable Vietnamese pronunciation
  • Code-switching support: Seamless transitions between Vietnamese and English
  • Better voice cloning: Higher fidelity and speaker consistency
  • Real-time synthesis: 24 kHz waveform generation on CPU or GPU
  • Multiple model formats: Support for PyTorch, GGUF Q4/Q8 (CPU optimized), and ONNX codec

VieNeu-TTS delivers production-ready speech synthesis fully offline.

Author: Phạm Nguyễn Ngọc Bảo


VieNeu-TTS


🔬 Model Overview

  • Backbone:
    • VieNeu-TTS (0.5B): Qwen-0.5B fine-tuned from NeuTTS Air.
    • VieNeu-TTS-0.3B: Custom 0.3B model trained from scratch, optimized for extreme speed (2x faster).
  • Audio codec: NeuCodec (torch implementation; ONNX & quantized variants supported)
  • Context window: 2,048 tokens shared by prompt text and speech tokens
  • Output watermark: Enabled by default
  • Training data: VieNeu-TTS-1000h — 443,641 curated Vietnamese samples (Used for both versions).

Model Variants

Model Format Device Quality Speed
VieNeu-TTS PyTorch GPU/CPU ⭐⭐⭐⭐⭐ Very Fast with lmdeploy
VieNeu-TTS-0.3B PyTorch GPU/CPU ⭐⭐⭐⭐ Ultra Fast (2x)
VieNeu-TTS-q8-gguf GGUF Q8 CPU/GPU ⭐⭐⭐⭐ Fast
VieNeu-TTS-q4-gguf GGUF Q4 CPU/GPU ⭐⭐⭐ Very Fast
VieNeu-TTS-0.3B-q8-gguf GGUF Q8 CPU/GPU ⭐⭐⭐⭐ Ultra Fast (1.5x)
VieNeu-TTS-0.3B-q4-gguf GGUF Q4 CPU/GPU ⭐⭐⭐ Extreme Speed (2x)

Recommendations:

  • GPU users: Use VieNeu-TTS (PyTorch) for best quality
  • CPU users: Use VieNeu-TTS-0.3B-q4-gguf for fastest inference or VieNeu-TTS-0.3B-q8-gguf for best CPU quality.
  • Streaming: Only GGUF models support streaming inference (Requires llama-cpp-python >= 0.3.16)

✅ Todo & Status

  • Publish safetensor artifacts
  • Release GGUF Q4 / Q8 models
  • Release datasets (1000h and 140h)
  • Enable streaming on GPU
  • Provide Dockerized setup
  • Release fine-tuning code

🏁 Getting Started

1. Clone the repository

git clone https://github.com/pnnbao97/VieNeu-TTS.git
cd VieNeu-TTS

2. Install eSpeak NG (Required)

Phonemizer requires eSpeak NG to function.

  • Windows: Download installer from eSpeak NG Releases (Recommended: .msi).
  • macOS: brew install espeak
  • Ubuntu/Debian: sudo apt install espeak-ng
  • Arch Linux: paru -S aur/espeak-ng

3. Environment Setup (Choose ONE method)

Method 1: Standard with uv (Recommended)

This is the fastest and most reliable way to manage dependencies.

A. Install uv (If you haven't already):

  • Windows: powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
  • Linux/macOS: curl -LsSf https://astral.sh/uv/install.sh | sh

B. Choose your hardware:

Option A: For GPU Users (NVIDIA 30xx/40xx/50xx)

Update your NVIDIA Drivers & Install CUDA Toolkit! This project uses CUDA 12.8. Please ensure your NVIDIA driver is up-to-date (support CUDA 12.8 or newer) to avoid compatibility issues, especially on RTX 30 series.

To use lmdeploy, you MUST install the NVIDIA GPU Computing Toolkit: https://developer.nvidia.com/cuda-downloads.

uv sync

Option B: For CPU-only Users

  1. Switch to CPU configuration:
    # Windows:
    ren pyproject.toml pyproject.toml.bak
    copy pyproject.toml.cpu pyproject.toml
    
    # Linux/macOS:
    mv pyproject.toml pyproject.toml.bak
    cp pyproject.toml.cpu pyproject.toml
    
  2. Install dependencies:
    uv sync
    

C. Run the Application:

uv run gradio_app.py

Then access the Web UI at http://127.0.0.1:7860.


Method 2: Automatic with Makefile (Alternative)

Best if you have make installed (standard on Linux/macOS, or via Git Bash on Windows). It handles configuration swaps automatically.

  • Setup GPU: make setup-gpu
  • Setup CPU: make setup-cpu
  • Run Demo: make demo

Then access the Web UI at http://127.0.0.1:7860.



🐋 Docker Deployment

For a quick start or production deployment without manually installing dependencies, use Docker.

Quick Start

Copy .env.example to .env

cp .env.example .env

Build and start container

# Run with CPU
docker compose --profile cpu up

# Run with GPU (requires NVIDIA Container Toolkit)
docker compose --profile gpu up

Access the Web UI at http://localhost:7860.

For detailed deployment instructions, including production setup, see docs/Deploy.md.


📦 Project Structure

VieNeu-TTS/
├── examples/
│   ├── infer_long_text.py     # CLI for long-form synthesis (chunked)
│   └── sample_long_text.txt   # Example paragraph for testing
├── gradio_app.py              # Local Gradio web demo with LMDeploy support
├── main.py                    # Basic batch inference script
├── config.yaml                # Configuration for models, codecs, and voices
├── output_audio/              # Generated audio (created when running scripts)
├── sample/                    # Reference voices (audio + transcript + codes)
│   ├── Bình (nam miền Bắc).wav/txt/pt
│   ├── Đoan (nữ miền Nam).wav/txt/pt
│   ├── Dung (nữ miền Nam).wav/txt/pt
│   ├── Hương (nữ miền Bắc).wav/txt/pt
│   ├── Ly (nữ miền Bắc).wav/txt/pt
│   ├── Ngọc (nữ miền Bắc).wav/txt/pt
│   ├── Nguyên (nam miền Nam).wav/txt/pt
│   ├── Sơn (nam miền Nam).wav/txt/pt
│   ├── Tuyên (nam miền Bắc).wav/txt/pt
│   └── Vĩnh (nam miền Nam).wav/txt/pt
├── utils/
│   ├── __init__.py
│   ├── core_utils.py          # Text chunking utilities
│   ├── normalize_text.py      # Vietnamese text normalization pipeline
│   ├── phonemize_text.py      # Text to phoneme conversion
│   └── phoneme_dict.json      # Phoneme dictionary
├── vieneu_tts/
│   ├── __init__.py            # Exports VieNeuTTS and FastVieNeuTTS
│   └── vieneu_tts.py          # Core VieNeuTTS implementation (VieNeuTTS & FastVieNeuTTS)
├── README.md
├── requirements.txt           # Basic dependencies (legacy)
├── pyproject.toml             # Project configuration with full dependencies (UV)
└── uv.lock                    # UV lock file for dependency management

📚 References


📄 License

  • VieNeu-TTS (0.5B): Original terms (Apache 2.0).
  • VieNeu-TTS-0.3B: Released under CC BY-NC 4.0 (Non-Commercial).
    • This version is currently experimental.
    • Commercial use is prohibited without authorization. Please contact the author for commercial licensing.

📑 Citation

@misc{vieneutts2026,
  title        = {VieNeu-TTS: Vietnamese Text-to-Speech with Instant Voice Cloning},
  author       = {Pham Nguyen Ngoc Bao},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/pnnbao-ump/VieNeu-TTS}}
}

🤝 Contributing

Contributions are welcome!

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit your changes: git commit -m "Add amazing feature"
  4. Push the branch: git push origin feature/amazing-feature
  5. Open a pull request

📞 Support


🙏 Acknowledgements

This project builds upon NeuTTS Air for the original 0.5B model. The 0.3B version is a custom architecture trained from scratch using the VieNeu-TTS-1000h dataset.


Made with ❤️ for the Vietnamese TTS community

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support