dinhthuan commited on
Commit
e02c6c6
·
verified ·
1 Parent(s): 73ea0f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -27,13 +27,13 @@ Vira-TTS is a neural TTS model that can synthesize natural Vietnamese speech fro
27
  | Property | Value |
28
  |----------|-------|
29
  | Base Architecture | Qwen2-0.5B |
30
- | Audio Codec | BiCodec |
31
- | Sample Rate | 24kHz (native), 48kHz (with FlashSR) |
32
  | Language | Vietnamese |
33
 
34
  ## Features
35
 
36
- - **Zero-shot Voice Cloning**: Clone any voice from a short reference audio (3-10 seconds recommended)
37
  - **Vietnamese Optimized**: Finetuned specifically for Vietnamese pronunciation and prosody
38
  - **Text Normalization**: Automatic conversion of numbers and abbreviations to spoken form
39
  - **High Quality Output**: 48kHz audio with FlashSR upsampling
@@ -91,7 +91,7 @@ The model includes automatic Vietnamese text normalization:
91
  ## Requirements
92
 
93
  - Python >= 3.10
94
- - CUDA compatible GPU (recommended: 8GB+ VRAM)
95
  - Dependencies: lmdeploy, fastaudiosr, ncodec, gradio
96
 
97
  ## Limitations
 
27
  | Property | Value |
28
  |----------|-------|
29
  | Base Architecture | Qwen2-0.5B |
30
+ | Audio Codec | Fash-BiCodec |
31
+ | Sample Rate | 16kHz (native), 48kHz (with FlashSR) |
32
  | Language | Vietnamese |
33
 
34
  ## Features
35
 
36
+ - **Zero-shot Voice Cloning**: Clone any voice from a short reference audio (3-12 seconds recommended)
37
  - **Vietnamese Optimized**: Finetuned specifically for Vietnamese pronunciation and prosody
38
  - **Text Normalization**: Automatic conversion of numbers and abbreviations to spoken form
39
  - **High Quality Output**: 48kHz audio with FlashSR upsampling
 
91
  ## Requirements
92
 
93
  - Python >= 3.10
94
+ - CUDA compatible GPU (recommended: 6GB+ VRAM)
95
  - Dependencies: lmdeploy, fastaudiosr, ncodec, gradio
96
 
97
  ## Limitations