Update README.md
Browse files
README.md
CHANGED
|
@@ -26,6 +26,26 @@ pipeline_tag: text-to-speech
|
|
| 26 |
- ravdess: 1 hour
|
| 27 |
- vctk: 41 hours
|
| 28 |
- Private data: 16 hours
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
## 🚀 Quick Start
|
| 31 |
|
|
|
|
| 26 |
- ravdess: 1 hour
|
| 27 |
- vctk: 41 hours
|
| 28 |
- Private data: 16 hours
|
| 29 |
+
|
| 30 |
+
##🆕 Mamba-based Text Encoder (Experimental Update)
|
| 31 |
+
We introduce an experimental Mamba-based text encoder variant of MeloVC, replacing the original Transformer encoder with a state-space model (Mamba) to improve long-sequence modeling efficiency and inference stability.
|
| 32 |
+
- 🔬 Motivation
|
| 33 |
+
- While Transformer-based encoders perform well, they suffer from:
|
| 34 |
+
- Quadratic complexity with sequence length
|
| 35 |
+
- High memory overhead during inference
|
| 36 |
+
|
| 37 |
+
- Mamba provides:
|
| 38 |
+
- Linear-time sequence modeling
|
| 39 |
+
- Better scalability for long and mixed-language text
|
| 40 |
+
- More stable inference on limited GPU memory
|
| 41 |
+
- 📦 Available Mamba Checkpoints
|
| 42 |
+
Component File
|
| 43 |
+
Generator G_Mamba_30000.pth
|
| 44 |
+
Discriminator D_Mamba_30000.pth
|
| 45 |
+
Duration Predictor DUR_Mamba_30000.pth
|
| 46 |
+
Config config_Mamba.json
|
| 47 |
+
|
| 48 |
+
⚠️ Note: This variant is experimental. Prosody and expressiveness may differ slightly from the Transformer version.
|
| 49 |
|
| 50 |
## 🚀 Quick Start
|
| 51 |
|