shichaog
/

MeloVC

       - ravdess: 1 hour
       - vctk: 41 hours
       - Private data: 16 hours
+  ##🆕 Mamba-based Text Encoder (Experimental Update)
+  We introduce an experimental Mamba-based text encoder variant of MeloVC, replacing the original Transformer encoder with a state-space model (Mamba) to improve long-sequence modeling efficiency and inference stability.
+  - 🔬 Motivation
+  - While Transformer-based encoders perform well, they suffer from:
+    - Quadratic complexity with sequence length
+    - High memory overhead during inference
+  - Mamba provides:
+    - Linear-time sequence modeling
+    - Better scalability for long and mixed-language text
+    - More stable inference on limited GPU memory
+  - 📦 Available Mamba Checkpoints
+    Component	File
+   Generator	G_Mamba_30000.pth
+   Discriminator	D_Mamba_30000.pth
+   Duration Predictor	DUR_Mamba_30000.pth
+   Config	config_Mamba.json
+    ⚠️ Note: This variant is experimental. Prosody and expressiveness may differ slightly from the Transformer version.
   ## 🚀 Quick Start