| --- |
| license: apache-2.0 |
| language: |
| - vi |
| --- |
| |
| # VieNeu-Codec: The Heart of VieNeu-TTS v2 |
|
|
| **VieNeu-Codec** is the high-performance audio engine built specifically for the upcoming **VieNeu-TTS v2**. It is a neural audio codec trained on over **20,000 hours** of diverse Vietnamese and English speech data, ensuring state-of-the-art robustness, natural prosody, and crystal-clear audio reconstruction. |
|
|
| This repository provides the optimized ONNX versions of the VieNeu-Codec for production use. |
|
|
| ## π Key Features |
|
|
| - **24kHz High-Fidelity**: Crystal clear audio reconstruction optimized for the Vietnamese language. |
| - **Zero-Shot Voice Cloning**: Clone any voice with just 5 seconds of reference audio. |
| - **Optimized for VieNeu-TTS v2**: Seamlessly integrates with the next-generation LLM backbone of VieNeu-TTS. |
| - **Two Deployment Modes**: Includes both FP32 (High Quality) and INT8 (High Speed) decoders. |
|
|
| ## π¦ Model Components |
|
|
| - **`vieneu_decoder.onnx`**: (FP32) High-fidelity audio decoder for maximum quality. |
| - **`vieneu_decoder_int8.onnx`**: (INT8) Quantized decoder for fast CPU inference. |
| |
| ## π οΈ Usage |
| |
| ### Synthesize Speech |
| Combine the speaker embedding with content tokens from your LLM (VieNeu-TTS v2): |
| ```python |
| sess_dec = ort.InferenceSession("vieneu_decoder.onnx") |
| audio = sess_dec.run(None, { |
| "content_ids": ids, |
| "voice": embedding |
| })[0] |
| ``` |
| |
| ## π License & Attribution |
| Author: **Pham Nguyen Ngoc Bao** |
| Project: **VieNeu-Codec (for VieNeu-TTS v2)** |
| Version: 2.0 |