AEmotionStudio commited on
Commit
347ba3a
·
verified ·
1 Parent(s): d595b2c

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -0
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - audiocraft
5
+ - melodyflow
6
+ - music-generation
7
+ - music-editing
8
+ - flow-matching
9
+ language:
10
+ - en
11
+ ---
12
+
13
+ # MelodyFlow — AEmotionStudio mirror
14
+
15
+ 1:1 mirror of [facebook/melodyflow-t24-30secs](https://huggingface.co/facebook/melodyflow-t24-30secs). Used by the MAESTRO / Æmotion Studio AI Workstation's **MelodyFlow** panel (Design → MelodyFlow).
16
+
17
+ ## License — Non-Commercial
18
+
19
+ **Weights:** CC-BY-NC-4.0. Generated outputs may NOT be used in commercial projects, paid releases, or client work.
20
+
21
+ **Code (audiocraft):** MIT. MelodyFlow's inference code lives in the [`facebook/MelodyFlow`](https://huggingface.co/spaces/facebook/MelodyFlow) HuggingFace Space — Meta uploaded it there but never merged it into audiocraft `main`. MAESTRO vendors that Space's `audiocraft/` subtree under `backend/ai/melodyflow_pkg/`. The non-commercial clause attaches only to the weights and to anything derived from running them.
22
+
23
+ ## Format
24
+
25
+ This mirror keeps the upstream `.bin` layout (PyTorch pickle) verbatim — `state_dict.bin` (the flow-matching DiT language model) plus `compression_state_dict.bin` (the EnCodec compression model, 2-channel / 32 kHz). We do NOT convert to safetensors here because the vendored audiocraft loader expects pickled `{xp.cfg, best_state}` packages and reads the OmegaConf cfg blob alongside the tensor dict in one `torch.load` call. Splitting cfg into a sidecar would require a custom loader — deferred.
26
+
27
+ PyTorch 2.6+'s default `weights_only=True` rejects these pickles (numpy scalars in `xp.cfg`). MAESTRO's runner wraps the load in a `_TorchLoadWeightsOnlyShim` context manager; vanilla audiocraft users on torch ≥ 2.6 will hit the same issue and need a similar shim.
28
+
29
+ ## Loading
30
+
31
+ ```python
32
+ # Requires the facebook/MelodyFlow Space's audiocraft subtree on PYTHONPATH
33
+ # (the upstream audiocraft PyPI release does NOT include MelodyFlow).
34
+ from audiocraft.models import MelodyFlow
35
+ model = MelodyFlow.get_pretrained('AEmotionStudio/melodyflow-models', device='cuda')
36
+
37
+ # Generate from text alone:
38
+ model.set_generation_params(solver='midpoint', steps=64, duration=10.0)
39
+ wav = model.generate(descriptions=['cinematic strings'])
40
+
41
+ # OR edit a source clip via regularized latent inversion:
42
+ import torchaudio
43
+ src, sr = torchaudio.load('source.wav') # MelodyFlow's EnCodec is stereo
44
+ if src.shape[0] == 1: src = src.repeat(2, 1)
45
+ src = src.unsqueeze(0).to('cuda')
46
+ import torch
47
+ with torch.no_grad():
48
+ prompt_tokens = model.encode_audio(src)
49
+ model.set_editing_params(solver='euler', steps=25, regularize=True,
50
+ regularize_iters=4, lambda_kl=0.2)
51
+ edited = model.edit(prompt_tokens=prompt_tokens,
52
+ descriptions=['solo piano with reverb'],
53
+ src_descriptions=['gentle arpeggio'])
54
+ torchaudio.save('edited.wav', edited[0].cpu(), model.sample_rate)
55
+ ```
56
+
57
+ ## Citation
58
+
59
+ MelodyFlow is described in:
60
+
61
+ > Le Lan, G., Nagaraja, V., Chang, E., Kant, D., Ni, Z., Shi, Y., Iandola, F., & Chandra, V. (2024). **High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching**. arXiv:2407.03648.