Update README.md
Browse files
README.md
CHANGED
|
@@ -27,7 +27,7 @@ pipeline_tag: text-to-speech
|
|
| 27 |
- vctk: 41 hours
|
| 28 |
- Private data: 16 hours
|
| 29 |
|
| 30 |
-
|
| 31 |
We introduce an experimental Mamba-based text encoder variant of MeloVC, replacing the original Transformer encoder with a state-space model (Mamba) to improve long-sequence modeling efficiency and inference stability.
|
| 32 |
- 🔬 Motivation
|
| 33 |
- While Transformer-based encoders perform well, they suffer from:
|
|
@@ -39,11 +39,13 @@ pipeline_tag: text-to-speech
|
|
| 39 |
- Better scalability for long and mixed-language text
|
| 40 |
- More stable inference on limited GPU memory
|
| 41 |
- 📦 Available Mamba Checkpoints
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
|
|
|
|
|
|
| 47 |
|
| 48 |
⚠️ Note: This variant is experimental. Prosody and expressiveness may differ slightly from the Transformer version.
|
| 49 |
|
|
|
|
| 27 |
- vctk: 41 hours
|
| 28 |
- Private data: 16 hours
|
| 29 |
|
| 30 |
+
## 🆕 Mamba-based Text Encoder (Experimental Update)
|
| 31 |
We introduce an experimental Mamba-based text encoder variant of MeloVC, replacing the original Transformer encoder with a state-space model (Mamba) to improve long-sequence modeling efficiency and inference stability.
|
| 32 |
- 🔬 Motivation
|
| 33 |
- While Transformer-based encoders perform well, they suffer from:
|
|
|
|
| 39 |
- Better scalability for long and mixed-language text
|
| 40 |
- More stable inference on limited GPU memory
|
| 41 |
- 📦 Available Mamba Checkpoints
|
| 42 |
+
| Component | File |
|
| 43 |
+
| ------------------ | --------------------- |
|
| 44 |
+
| Generator | `G_Mamba_30000.pth` |
|
| 45 |
+
| Discriminator | `D_Mamba_30000.pth` |
|
| 46 |
+
| Duration Predictor | `DUR_Mamba_30000.pth` |
|
| 47 |
+
| Config | `config_Mamba.json` |
|
| 48 |
+
|
| 49 |
|
| 50 |
⚠️ Note: This variant is experimental. Prosody and expressiveness may differ slightly from the Transformer version.
|
| 51 |
|