| # MelodyFlow: High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching |
|
|
| AudioCraft provides the code and models for MelodyFlow, [High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching][arxiv]. |
|
|
| MelodyFlow is a text-guided music generation and editing model capable of generating high-quality stereo samples conditioned on text descriptions. |
| It is a Flow Matching Diffusion Transformer trained over a 48 kHz stereo (resp. 32 kHz mono) quantizer-free EnCodec tokenizer sampled at 25 Hz (resp. 20 Hz). |
| Unlike prior work on Flow Matching for music generation such as [MusicFlow: Cascaded Flow Matching for Text Guided Music Generation](https://openreview.net/forum?id=kOczKjmYum), |
| MelodyFlow doesn't require model cascading, which makes it very convenient for music editing. |
|
|
| Check out our [sample page][melodyflow_samples] or test the available demo! |
|
|
| We use 16K hours of licensed music to train MelodyFlow. Specifically, we rely on an internal dataset |
| of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data. |
|
|
| ## Local Inference Warning |
|
|
| If you are running MelodyFlow locally, the main failure mode we hit was not prompt quality or solver choice. The real problem was using the wrong code path. |
|
|
| - Successful local text-to-music generation depended on using the official MelodyFlow Space implementation or a maintained fork of it. |
| - A stale generic AudioCraft checkout can produce structurally valid files that still sound like buzz or hum because the latent contract does not match the released MelodyFlow implementation. |
| - On PyTorch 2.6 and newer, trusted local checkpoint loads may require `weights_only=False`. |
|
|
| Read the sections below before debugging sampler settings. |
|
|
| ### Known Good Local Shape |
|
|
| The local inference path that worked reliably had these pieces: |
|
|
| 1. a dedicated MelodyFlow Python environment |
| 2. a local checkout of the official MelodyFlow Space or a maintained fork |
| 3. a local checkpoint directory for `facebook/melodyflow-t24-30secs` |
| 4. imports resolved from the Space checkout, not from an older generic AudioCraft clone |
| 5. a trusted checkpoint load path that can force `weights_only=False` on PyTorch 2.6+ |
|
|
| If any of those pieces differ, treat that as a setup issue first. |
|
|
|
|
| ## Model Card |
|
|
| See [the model card](../model_cards/MELODYFLOW_MODEL_CARD.md). |
|
|
|
|
| ## Installation |
|
|
| Please follow the AudioCraft installation instructions from the [README](../README.md). |
|
|
| AudioCraft requires a GPU with at least 16 GB of memory for running inference with the medium-sized models (~1.5B parameters). |
|
|
| ## Usage |
|
|
| We currently offer two ways to interact with MAGNeT: |
| 1. You can use the gradio demo locally by running [`python -m demos.melodyflow_app --share`](../demos/melodyflow_app.py). |
| 2. You can play with MelodyFlow by running the jupyter notebook at [`demos/melodyflow_demo.ipynb`](../demos/melodyflow_demo.ipynb) locally (also works on CPU). |
|
|
| ## Local Fork Maintenance Notes |
|
|
| If you maintain a derived MelodyFlow repo for local development or deployment, the practical git layout is the same as any normal fork workflow: |
|
|
| 1. Keep the official MelodyFlow Space as `upstream`. |
| 2. Keep your writable Hugging Face repo as `origin`. |
| 3. Rebase or merge from `upstream/main` on a regular cadence so compatibility fixes do not drift. |
|
|
| Example remote layout: |
|
|
| ```bash |
| git remote rename origin upstream |
| git remote add origin https://huggingface.co/ericleigh007/MelodyFlow |
| git fetch upstream |
| ``` |
|
|
| ### PyTorch 2.6 Checkpoint Compatibility |
|
|
| Some local consumers loading older MelodyFlow checkpoints under PyTorch 2.6 or newer may need to override the new default `torch.load(..., weights_only=True)` behavior. |
|
|
| For trusted local checkpoint files, a compatibility wrapper like the following can be required: |
|
|
| ```python |
| import torch |
| |
| original_load = torch.load |
| |
| def trusted_load(*args, **kwargs): |
| kwargs.setdefault("weights_only", False) |
| return original_load(*args, **kwargs) |
| ``` |
|
|
| Without that override, loading can fail with errors involving `omegaconf.dictconfig.DictConfig` or other non-tensor objects serialized by older releases. |
|
|
| ### Local Regression Checks |
|
|
| If local outputs suddenly regress into buzz, hum, or other clearly invalid audio, check these before tuning solver parameters: |
|
|
| 1. Confirm you are importing the official `MelodyFlow` class and not reconstructing the model from an older generic AudioCraft checkout. |
| 2. Confirm the checkpoint directory still includes both `state_dict.bin` and `compression_state_dict.bin`. |
| 3. Confirm the local runner or application is still using the intended fork checkout instead of a stale clone elsewhere on disk. |
|
|
| ## API |
|
|
| We provide a simple API and 1 pre-trained model: |
| - `facebook/melodyflow-t24-30secs`: 1B model, text to music, generates 30-second samples - [🤗 Hub](https://huggingface.co/facebook/melodyflow-t24-30secs) |
|
|
| See after a quick example for using the API. |
|
|
| ```python |
| import torchaudio |
| from audiocraft.models import MelodyFlow |
| from audiocraft.data.audio import audio_write |
| |
| model = MelodyFlow.get_pretrained('facebook/melodyflow-t24-30secs') |
| descriptions = ['disco beat', 'energetic EDM', 'funky groove'] |
| wav = model.generate(descriptions) # generates 3 samples. |
| |
| for idx, one_wav in enumerate(wav): |
| # Will save under {idx}.wav, with loudness normalization at -14 db LUFS. |
| audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True) |
| ``` |
|
|
| ## Training |
|
|
| Coming later... |
|
|
| ## Citation |
| ``` |
| @misc{lan2024high, |
| title={High fidelity text-guided music generation and editing via single-stage flow matching}, |
| author={Le Lan, Gael and Shi, Bowen and Ni, Zhaoheng and Srinivasan, Sidd and Kumar, Anurag and Ellis, Brian and Kant, David and Nagaraja, Varun and Chang, Ernie and Hsu, Wei-Ning and others}, |
| year={2024}, |
| eprint={2407.03648}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.SD} |
| } |
| ``` |
|
|
| ## License |
|
|
| See license information in the [model card](../model_cards/MELODYFLOW_MODEL_CARD.md). |
|
|
| [arxiv]: https://arxiv.org/pdf/2407.03648 |
| [magnet_samples]: https://melodyflow.github.io/ |
|
|