Add README for Mæstræa mirror
Browse files
README.md
ADDED
|
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- audio
|
| 5 |
+
- voice-conversion
|
| 6 |
+
- singing-voice
|
| 7 |
+
- speech-synthesis
|
| 8 |
+
- vevo2
|
| 9 |
+
- amphion
|
| 10 |
+
- safetensors
|
| 11 |
+
- maestraea
|
| 12 |
+
pipeline_tag: audio-to-audio
|
| 13 |
+
base_model: amphion/Vevo
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# Vevo2 Models (Mæstræa Mirror)
|
| 17 |
+
|
| 18 |
+
**Singing Voice Synthesis, Conversion & Editing**
|
| 19 |
+
|
| 20 |
+
[Original Model](https://huggingface.co/amphion/Vevo) by [OpenMMLab / Amphion](https://github.com/open-mmlab/Amphion) · MIT License
|
| 21 |
+
|
| 22 |
+
> This is a mirror of the Vevo2 model weights for use with [Mæstræa AI Workstation](https://github.com/AEmotionStudio/Maestraea). All credits go to the original authors.
|
| 23 |
+
|
| 24 |
+
## What's in This Repo
|
| 25 |
+
|
| 26 |
+
| Path | Description | Size |
|
| 27 |
+
|------|-------------|------|
|
| 28 |
+
| `contentstyle_modeling/PhoneToVq8192/model.safetensors` | AR model (Qwen2.5-0.5B, ~500M params) | ~2.5 GB |
|
| 29 |
+
| `contentstyle_modeling/Vq32ToVq8192/model.safetensors` | Style transfer model | ~1.5 GB |
|
| 30 |
+
| `acoustic_modeling/Vq8192ToMels/model.safetensors` | Flow matching model (~350M params) | ~1.4 GB |
|
| 31 |
+
| `acoustic_modeling/Vocoder/model*.safetensors` | Vocos vocoder (~250M params) | ~1 GB |
|
| 32 |
+
| `tokenizer/vq32/` | HuBERT tokenizer (pickle + config) | ~1.3 GB |
|
| 33 |
+
| `tokenizer/vq8192/model.safetensors` | VQ8192 tokenizer | ~200 MB |
|
| 34 |
+
|
| 35 |
+
**Total: ~8 GB**
|
| 36 |
+
|
| 37 |
+
## What Vevo2 Does
|
| 38 |
+
|
| 39 |
+
Vevo2 is a state-of-the-art voice conversion and singing voice synthesis system from the Amphion toolkit. It supports:
|
| 40 |
+
|
| 41 |
+
- **Voice Conversion** — Transform vocals to a target voice/timbre
|
| 42 |
+
- **Singing Voice Synthesis** — Generate singing from text + melody
|
| 43 |
+
- **Speech Editing** — Modify speech content while preserving speaker identity
|
| 44 |
+
- **Zero-Shot TTS** — Generate speech in any voice from a short reference
|
| 45 |
+
|
| 46 |
+
### Architecture
|
| 47 |
+
|
| 48 |
+
- **AR Model** (Qwen2.5-0.5B) — Autoregressive content-style modeling
|
| 49 |
+
- **FM Model** (~350M) — Flow matching for acoustic generation
|
| 50 |
+
- **Vocos Vocoder** (~250M) — High-quality waveform synthesis
|
| 51 |
+
- **Total: ~1.1B parameters**
|
| 52 |
+
|
| 53 |
+
### VRAM Requirements
|
| 54 |
+
|
| 55 |
+
| Reference Length | VRAM |
|
| 56 |
+
|-----------------|------|
|
| 57 |
+
| 15s | ~8 GB |
|
| 58 |
+
| 30s | ~10 GB |
|
| 59 |
+
| 45s | ~12 GB |
|
| 60 |
+
|
| 61 |
+
Recommended: Keep reference audio to 15–45 seconds.
|
| 62 |
+
|
| 63 |
+
## Usage with Mæstræa
|
| 64 |
+
|
| 65 |
+
These models are automatically downloaded by the Mæstræa AI Workstation backend. Place in:
|
| 66 |
+
|
| 67 |
+
```
|
| 68 |
+
~/.maestraea/models/vevo2/
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## License
|
| 72 |
+
|
| 73 |
+
MIT — same as the original Amphion/Vevo2 release.
|
| 74 |
+
|
| 75 |
+
## Credits
|
| 76 |
+
|
| 77 |
+
- **Model**: [Amphion Vevo2](https://github.com/open-mmlab/Amphion/tree/main/models/vc/vevo2)
|
| 78 |
+
- **Paper**: See [Amphion repository](https://github.com/open-mmlab/Amphion) for citation
|
| 79 |
+
- **Mirror by**: [AEmotionStudio](https://huggingface.co/AEmotionStudio)
|