File size: 1,502 Bytes
9c0a610
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
license: apache-2.0
language:
- vi
---

# VieNeu-Codec: The Heart of VieNeu-TTS v2

**VieNeu-Codec** is the high-performance audio engine built specifically for the upcoming **VieNeu-TTS v2**. It is a neural audio codec trained on over **20,000 hours** of diverse Vietnamese and English speech data, ensuring state-of-the-art robustness, natural prosody, and crystal-clear audio reconstruction.

This repository provides the optimized ONNX versions of the VieNeu-Codec for production use.

## 🚀 Key Features

- **24kHz High-Fidelity**: Crystal clear audio reconstruction optimized for the Vietnamese language.
- **Zero-Shot Voice Cloning**: Clone any voice with just 5 seconds of reference audio.
- **Optimized for VieNeu-TTS v2**: Seamlessly integrates with the next-generation LLM backbone of VieNeu-TTS.
- **Two Deployment Modes**: Includes both FP32 (High Quality) and INT8 (High Speed) decoders.

## 📦 Model Components

- **`vieneu_decoder.onnx`**: (FP32) High-fidelity audio decoder for maximum quality.
- **`vieneu_decoder_int8.onnx`**: (INT8) Quantized decoder for fast CPU inference.

## 🛠️ Usage

### Synthesize Speech
Combine the speaker embedding with content tokens from your LLM (VieNeu-TTS v2):
```python
sess_dec = ort.InferenceSession("vieneu_decoder.onnx")
audio = sess_dec.run(None, {
    "content_ids": ids,
    "voice": embedding
})[0]
```

## 📄 License & Attribution
Author: **Pham Nguyen Ngoc Bao**  
Project: **VieNeu-Codec (for VieNeu-TTS v2)**  
Version: 2.0