dolly-vn
/

Vira-TTS

text-generation

speech-synthesis

text-generation-inference

Model card Files Files and versions

dinhthuan commited on Jan 31

Commit

e02c6c6

·

verified ·

1 Parent(s): 73ea0f3

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -27,13 +27,13 @@ Vira-TTS is a neural TTS model that can synthesize natural Vietnamese speech fro
 | Property | Value |
 |----------|-------|
 | Base Architecture | Qwen2-0.5B |
-| Audio Codec | BiCodec |
-| Sample Rate | 24kHz (native), 48kHz (with FlashSR) |
 | Language | Vietnamese |
 ## Features
-- **Zero-shot Voice Cloning**: Clone any voice from a short reference audio (3-10 seconds recommended)
 - **Vietnamese Optimized**: Finetuned specifically for Vietnamese pronunciation and prosody
 - **Text Normalization**: Automatic conversion of numbers and abbreviations to spoken form
 - **High Quality Output**: 48kHz audio with FlashSR upsampling
@@ -91,7 +91,7 @@ The model includes automatic Vietnamese text normalization:
 ## Requirements
 - Python >= 3.10
-- CUDA compatible GPU (recommended: 8GB+ VRAM)
 - Dependencies: lmdeploy, fastaudiosr, ncodec, gradio
 ## Limitations

 | Property | Value |
 |----------|-------|
 | Base Architecture | Qwen2-0.5B |
+| Audio Codec | Fash-BiCodec |
+| Sample Rate | 16kHz (native), 48kHz (with FlashSR) |
 | Language | Vietnamese |
 ## Features
+- **Zero-shot Voice Cloning**: Clone any voice from a short reference audio (3-12 seconds recommended)
 - **Vietnamese Optimized**: Finetuned specifically for Vietnamese pronunciation and prosody
 - **Text Normalization**: Automatic conversion of numbers and abbreviations to spoken form
 - **High Quality Output**: 48kHz audio with FlashSR upsampling
 ## Requirements
 - Python >= 3.10
+- CUDA compatible GPU (recommended: 6GB+ VRAM)
 - Dependencies: lmdeploy, fastaudiosr, ncodec, gradio
 ## Limitations