vmo247
/

VieNeu-Codec

Model card Files Files and versions

VieNeu-Codec / README.md

phucpx1's picture

Upload folder using huggingface_hub

9c0a610 verified 5 days ago

|

history blame contribute delete

1.5 kB

	---
	license: apache-2.0
	language:
	- vi
	---

	# VieNeu-Codec: The Heart of VieNeu-TTS v2

	VieNeu-Codec is the high-performance audio engine built specifically for the upcoming VieNeu-TTS v2. It is a neural audio codec trained on over 20,000 hours of diverse Vietnamese and English speech data, ensuring state-of-the-art robustness, natural prosody, and crystal-clear audio reconstruction.

	This repository provides the optimized ONNX versions of the VieNeu-Codec for production use.

	## 🚀 Key Features

	- 24kHz High-Fidelity: Crystal clear audio reconstruction optimized for the Vietnamese language.
	- Zero-Shot Voice Cloning: Clone any voice with just 5 seconds of reference audio.
	- Optimized for VieNeu-TTS v2: Seamlessly integrates with the next-generation LLM backbone of VieNeu-TTS.
	- Two Deployment Modes: Includes both FP32 (High Quality) and INT8 (High Speed) decoders.

	## 📦 Model Components

	- `vieneu_decoder.onnx`: (FP32) High-fidelity audio decoder for maximum quality.
	- `vieneu_decoder_int8.onnx`: (INT8) Quantized decoder for fast CPU inference.

	## 🛠️ Usage

	### Synthesize Speech
	Combine the speaker embedding with content tokens from your LLM (VieNeu-TTS v2):
	```python
	sess_dec = ort.InferenceSession("vieneu_decoder.onnx")
	audio = sess_dec.run(None, {
	"content_ids": ids,
	"voice": embedding
	})[0]
	```

	## 📄 License & Attribution
	Author: Pham Nguyen Ngoc Bao
	Project: VieNeu-Codec (for VieNeu-TTS v2)
	Version: 2.0