Update README.md
Browse files
README.md
CHANGED
|
@@ -27,13 +27,13 @@ Vira-TTS is a neural TTS model that can synthesize natural Vietnamese speech fro
|
|
| 27 |
| Property | Value |
|
| 28 |
|----------|-------|
|
| 29 |
| Base Architecture | Qwen2-0.5B |
|
| 30 |
-
| Audio Codec | BiCodec |
|
| 31 |
-
| Sample Rate |
|
| 32 |
| Language | Vietnamese |
|
| 33 |
|
| 34 |
## Features
|
| 35 |
|
| 36 |
-
- **Zero-shot Voice Cloning**: Clone any voice from a short reference audio (3-
|
| 37 |
- **Vietnamese Optimized**: Finetuned specifically for Vietnamese pronunciation and prosody
|
| 38 |
- **Text Normalization**: Automatic conversion of numbers and abbreviations to spoken form
|
| 39 |
- **High Quality Output**: 48kHz audio with FlashSR upsampling
|
|
@@ -91,7 +91,7 @@ The model includes automatic Vietnamese text normalization:
|
|
| 91 |
## Requirements
|
| 92 |
|
| 93 |
- Python >= 3.10
|
| 94 |
-
- CUDA compatible GPU (recommended:
|
| 95 |
- Dependencies: lmdeploy, fastaudiosr, ncodec, gradio
|
| 96 |
|
| 97 |
## Limitations
|
|
|
|
| 27 |
| Property | Value |
|
| 28 |
|----------|-------|
|
| 29 |
| Base Architecture | Qwen2-0.5B |
|
| 30 |
+
| Audio Codec | Fash-BiCodec |
|
| 31 |
+
| Sample Rate | 16kHz (native), 48kHz (with FlashSR) |
|
| 32 |
| Language | Vietnamese |
|
| 33 |
|
| 34 |
## Features
|
| 35 |
|
| 36 |
+
- **Zero-shot Voice Cloning**: Clone any voice from a short reference audio (3-12 seconds recommended)
|
| 37 |
- **Vietnamese Optimized**: Finetuned specifically for Vietnamese pronunciation and prosody
|
| 38 |
- **Text Normalization**: Automatic conversion of numbers and abbreviations to spoken form
|
| 39 |
- **High Quality Output**: 48kHz audio with FlashSR upsampling
|
|
|
|
| 91 |
## Requirements
|
| 92 |
|
| 93 |
- Python >= 3.10
|
| 94 |
+
- CUDA compatible GPU (recommended: 6GB+ VRAM)
|
| 95 |
- Dependencies: lmdeploy, fastaudiosr, ncodec, gradio
|
| 96 |
|
| 97 |
## Limitations
|