Instructions to use AEmotionStudio/melodyflow-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Audiocraft
How to use AEmotionStudio/melodyflow-models with Audiocraft:
# Type of model unknown.
- Notebooks
- Google Colab
- Kaggle
| license: cc-by-nc-4.0 | |
| tags: | |
| - audiocraft | |
| - melodyflow | |
| - music-generation | |
| - music-editing | |
| - flow-matching | |
| language: | |
| - en | |
| # MelodyFlow — AEmotionStudio mirror | |
| 1:1 mirror of [facebook/melodyflow-t24-30secs](https://huggingface.co/facebook/melodyflow-t24-30secs). Used by the MAESTRO / Æmotion Studio AI Workstation's **MelodyFlow** panel (Design → MelodyFlow). | |
| ## License — Non-Commercial | |
| **Weights:** CC-BY-NC-4.0. Generated outputs may NOT be used in commercial projects, paid releases, or client work. | |
| **Code (audiocraft):** MIT. MelodyFlow's inference code lives in the [`facebook/MelodyFlow`](https://huggingface.co/spaces/facebook/MelodyFlow) HuggingFace Space — Meta uploaded it there but never merged it into audiocraft `main`. MAESTRO vendors that Space's `audiocraft/` subtree under `backend/ai/melodyflow_pkg/`. The non-commercial clause attaches only to the weights and to anything derived from running them. | |
| ## Format | |
| This mirror keeps the upstream `.bin` layout (PyTorch pickle) verbatim — `state_dict.bin` (the flow-matching DiT language model) plus `compression_state_dict.bin` (the EnCodec compression model, 2-channel / 32 kHz). We do NOT convert to safetensors here because the vendored audiocraft loader expects pickled `{xp.cfg, best_state}` packages and reads the OmegaConf cfg blob alongside the tensor dict in one `torch.load` call. Splitting cfg into a sidecar would require a custom loader — deferred. | |
| PyTorch 2.6+'s default `weights_only=True` rejects these pickles (numpy scalars in `xp.cfg`). MAESTRO's runner wraps the load in a `_TorchLoadWeightsOnlyShim` context manager; vanilla audiocraft users on torch ≥ 2.6 will hit the same issue and need a similar shim. | |
| ## Loading | |
| ```python | |
| # Requires the facebook/MelodyFlow Space's audiocraft subtree on PYTHONPATH | |
| # (the upstream audiocraft PyPI release does NOT include MelodyFlow). | |
| from audiocraft.models import MelodyFlow | |
| model = MelodyFlow.get_pretrained('AEmotionStudio/melodyflow-models', device='cuda') | |
| # Generate from text alone: | |
| model.set_generation_params(solver='midpoint', steps=64, duration=10.0) | |
| wav = model.generate(descriptions=['cinematic strings']) | |
| # OR edit a source clip via regularized latent inversion: | |
| import torchaudio | |
| src, sr = torchaudio.load('source.wav') # MelodyFlow's EnCodec is stereo | |
| if src.shape[0] == 1: src = src.repeat(2, 1) | |
| src = src.unsqueeze(0).to('cuda') | |
| import torch | |
| with torch.no_grad(): | |
| prompt_tokens = model.encode_audio(src) | |
| model.set_editing_params(solver='euler', steps=25, regularize=True, | |
| regularize_iters=4, lambda_kl=0.2) | |
| edited = model.edit(prompt_tokens=prompt_tokens, | |
| descriptions=['solo piano with reverb'], | |
| src_descriptions=['gentle arpeggio']) | |
| torchaudio.save('edited.wav', edited[0].cpu(), model.sample_rate) | |
| ``` | |
| ## Citation | |
| MelodyFlow is described in: | |
| > Le Lan, G., Nagaraja, V., Chang, E., Kant, D., Ni, Z., Shi, Y., Iandola, F., & Chandra, V. (2024). **High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching**. arXiv:2407.03648. | |