๐Ÿฆœ VieNeu-TTS v2 Turbo (GPU Edition)

The fastest Bilingual (Vietnamese & English) TTS engine with Instant Zero-Shot Voice Cloning.

Apache 2.0 VieNeu GitHub Discord


๐Ÿ“– Model Description

VieNeu-TTS v2 Turbo is the performance-tuned edition of the VieNeu-TTS family. Built on a transformer-based architecture and optimized for minimal latency, it delivers high-fidelity 24 kHz speech synthesis with Instant Voice Cloning capabilities.

This version is designed for GPU-accelerated inference (Standard/Transformers backend), making it ideal for real-time applications, interactive assistants, and creative content generation on platforms like Hugging Face Spaces (ZeroGPU).

โœจ Key Features

  • ๐Ÿฆœ Instant Voice Cloning: Clone any voice with just 3-5 seconds of reference audio. Truly zero-shotโ€”no reference text required for v2 Turbo!
  • ๐Ÿ‡ป๐Ÿ‡ณ๐Ÿ‡บ๐Ÿ‡ธ Bilingual (Code-switching): Seamlessly handles mixed Vietnameseโ€“English sentences in a single utterance.
  • ๐Ÿš€ Extreme Speed: Optimized architecture for ultra-low latency inference on GPUs.
  • ๐Ÿ”‡ AI Watermarking: Every audio output includes an imperceptible identifier for responsible AI content tracing.
  • ๐Ÿ”Š 24 kHz High-Fidelity: Studio-quality neural codec output.

๐Ÿš€ Quickstart

Option 1 โ€” Install via vieneu SDK (Recommended)

# Minimal installation (Turbo/CPU Only)
pip install vieneu

# Optional: Pre-built llama-cpp-python for CPU (if building fails)
pip install vieneu --extra-index-url https://pnnbao97.github.io/llama-cpp-python-v0.3.16/cpu/

# Optional: macOS Metal acceleration
pip install vieneu --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal/
from vieneu import Vieneu

# Initialize in Turbo mode (Default - Minimal dependencies)
tts = Vieneu()

# 1. Simple synthesis (uses default Southern Male voice 'Xuรขn Vฤฉnh')
text = "Hแป‡ thแป‘ng ฤ‘iแป‡n chแปง yแบฟu sแปญ dแปฅng alternating current because it is more efficient."
audio = tts.infer(text=text)

# Save to file
tts.save(audio, "output_Xuรขn Vฤฉnh.wav")
print("๐Ÿ’พ Saved to output_Xuรขn Vฤฉnh.wav")

# 2. Using a specific Preset Voice
voices = tts.list_preset_voices()
for desc, voice_id in voices:
    print(f"Voice: {desc} (ID: {voice_id})")

my_voice_id = voices[1][1] if len(voices) > 1 else voices[0][1] # Giแปng Phแบกm Tuyรชn
voice_data = tts.get_preset_voice(my_voice_id)

audio_custom = tts.infer(text="Tรดi ฤ‘ang nรณi bแบฑng giแปng cแปงa Bรกc sฤฉ Tuyรชn.", voice=voice_data)

# 3. Save to file
tts.save(audio_custom, "output_Phแบกm Tuyรชn.wav")
print("๐Ÿ’พ Saved to output_Phแบกm Tuyรชn.wav")

๐Ÿฆœ Zero-shot Voice Cloning (SDK)

Clone any voice with only 3-5 seconds of audio using the local Turbo engine:

from vieneu import Vieneu

tts = Vieneu() # Defaults to Turbo mode

# 1. Encode the reference audio (extracts speaker embedding)
# Supported formats: .wav, .mp3, .flac
my_voice = tts.encode_reference("examples/audio_ref/example.wav")

# 2. Synthesize with the cloned voice
# No reference text required for Turbo v2!
audio = tts.infer(
    text="ฤรขy lร  giแปng nรณi ฤ‘ฦฐแปฃc clone trแปฑc tiแบฟp bแบฑng SDK cแปงa VieNeu-TTS.", 
    voice=my_voice
)

tts.save(audio, "cloned_voice.wav")

Option 2 โ€” Web UI (Full repo)

git clone https://github.com/pnnbao97/VieNeu-TTS.git
cd VieNeu-TTS
uv sync          # minimal install (Turbo/CPU)
uv run vieneu-web
# โ†’ Open http://127.0.0.1:7860

๐Ÿ”ฌ Model Architecture

VieNeu-TTS v2 Turbo utilizes a state-of-the-art two-stage pipeline:

  1. Transformer LLM Backbone: A decoder-only transformer that predicts discrete audio tokens from text and speaker embeddings.
  2. Neural Codec (VieNeu-Codec): A high-performance VQ-VAE decoder that converts tokens into a 24 kHz waveform with minimal artifacts.

๐Ÿ“Š Training Data

Trained on a massive multi-speaker dataset comprising over 20,000 hours of high-quality speech:

Dataset Language Description
pnnbao-ump/VieNeu-TTS-1000h Vietnamese DeepMind/Vietnamese studio-quality corpus
pnnbao-ump/vietnamese-audio-corpus Vietnamese Large-scale multi-accent Vietnamese data
amphion/Emilia-Dataset Multilingual Large-scale multilingual diverse speech
facebook/multilingual_librispeech English Extensive English read speech

๐Ÿ—บ๏ธ Roadmap

  • Turbo GPU (Transformers) Engine
  • Bilingual (Vietnameseโ€“English) Support
  • Zero-shot Voice Cloning
  • Mobile SDK (Android / iOS)
  • Streaming API Integration

๐Ÿค Support & Links

Resource Link
๐Ÿ™ GitHub pnnbao97/VieNeu-TTS
๐Ÿ“– Documentation docs.vieneu.io
๐Ÿ“ฆ PyPI pip install vieneu
๐Ÿ’ฌ Discord Join here

๐Ÿ“„ License

Released under Apache License 2.0 โ€” permissible for both personal and commercial use.


Made with โค๏ธ for the Vietnamese TTS community by @pnnbao97 and contributors.

Downloads last month
20
Safetensors
Model size
0.1B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Datasets used to train vmo247/VieNeu-TTS-v2-Turbo