Ashiedu commited on
Commit
8fa1fd6
·
verified ·
1 Parent(s): 6b214f0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +306 -57
README.md CHANGED
@@ -1,92 +1,341 @@
1
  ---
2
  license: apache-2.0
3
- language:
4
- - en
 
5
  tags:
6
  - music-generation
7
- - audio
8
- - onnx
9
- - directml
10
- - synesthesia
11
  - magenta
 
 
 
 
12
  - performance-rnn
 
 
 
 
13
  - musicvae
 
 
14
  - ddsp
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  library_name: onnxruntime
16
- pipeline_tag: audio-to-audio
17
  ---
18
 
19
  # Synesthesia — AI Music Models
20
 
21
- ONNX model weights for [Synesthesia](https://github.com/kryptodogg/synesthesia), a cyber-physical synthesizer and 3D/4D signal workstation.
 
22
 
23
- ## Models
 
 
24
 
25
- | Model | Source | Format | Size | Task |
26
- |-------|--------|--------|------|------|
27
- | Performance RNN | Magenta | ONNX | ~20MB | Note-level MIDI generation |
28
- | MusicVAE (Encoder) | Magenta | ONNX | ~80MB | Latent music encoding |
29
- | MusicVAE (Decoder) | Magenta | ONNX | ~80MB | Latent music decoding |
30
- | DDSP (Encoder) | Magenta | ONNX | ~30MB | Audio → harmonic params |
31
- | DDSP (Decoder) | Magenta | ONNX | ~30MB | Harmonic params → audio |
32
- | SpectroStream (Encoder) | Magenta RT | ONNX | TBD | Audio → spectral tokens |
33
- | SpectroStream (Decoder) | Magenta RT | ONNX | TBD | Spectral tokens → audio |
34
- | MusicCoCa (Text) | Google | ONNX | TBD | Text → music embedding |
35
- | MusicCoCa (Audio) | Google | ONNX | TBD | Audio → music embedding |
36
- | Gemma-3N | Google | ONNX | TBD | Vision → mood/energy JSON |
37
 
38
- ## Runtime
39
 
40
- All models run locally via **ONNX Runtime with DirectML** (GPU acceleration on Windows).
 
 
 
 
 
41
 
42
- ```toml
43
- # Cargo.toml
44
- [dependencies]
45
- ort = { version = "2", features = ["directml"] }
46
- ```
47
 
48
- ## Download
49
 
50
- ```python
51
- from huggingface_hub import snapshot_download
52
- snapshot_download("Ashiedu/Synesthesia", local_dir="./models")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  ```
54
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
  ```rust
56
- // Rust (using hf-hub crate)
57
  use hf_hub::api::sync::Api;
58
- let api = Api::new().unwrap();
59
- let repo = api.model("Ashiedu/Synesthesia".to_string());
60
- let model_path = repo.get("perfrnn/model.onnx").unwrap();
 
 
 
 
 
 
 
61
  ```
62
 
63
- ## Structure
64
 
 
 
 
 
 
 
 
 
 
 
 
 
65
  ```
66
- ├── perfrnn/
67
- │ └── model.onnx
68
- ├── musicvae/
69
- │ ├── encoder.onnx
70
- │ └── decoder.onnx
71
- ├── ddsp/
72
- │ ├── encoder.onnx
73
- │ └── decoder.onnx
74
- ├── spectrostream/
75
- │ ├── encoder.onnx
76
- │ └── decoder.onnx
77
- ├── musiccoca/
78
- │ ├── text.onnx
79
- │ └── audio.onnx
80
- ├── gemma3n/
81
- │ └── model.onnx
82
- └── manifest.json
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
  ```
84
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
  ## License
86
 
87
- Apache 2.0 — model weights may have additional upstream licenses (see individual model directories).
 
 
 
 
 
 
 
88
 
89
  ## Links
90
 
91
- - **GitHub:** [kryptodogg/synesthesia](https://github.com/kryptodogg/synesthesia)
92
- - **Roadmap:** See GitHub Issues with `lane:ml` label
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ task_categories:
4
+ - audio-to-audio
5
+ - text-to-audio
6
  tags:
7
  - music-generation
 
 
 
 
8
  - magenta
9
+ - magenta-rt
10
+ - onnx
11
+ - burn
12
+ - llama-cpp
13
  - performance-rnn
14
+ - melody-rnn
15
+ - drums-rnn
16
+ - improv-rnn
17
+ - polyphony-rnn
18
  - musicvae
19
+ - groovae
20
+ - piano-genie
21
  - ddsp
22
+ - gansynth
23
+ - nsynth
24
+ - coconet
25
+ - music-transformer
26
+ - onsets-and-frames
27
+ - spectrostream
28
+ - musiccoca
29
+ - synesthesia
30
+ - directml
31
+ - vulkan
32
+ - wgpu
33
+ - audio
34
+ - midi
35
+ language:
36
+ - en
37
  library_name: onnxruntime
 
38
  ---
39
 
40
  # Synesthesia — AI Music Models
41
 
42
+ ONNX and GGUF model weights for [Synesthesia](https://github.com/kryptodogg/synesthesia),
43
+ a cyber-physical synthesizer, 3D/4D signal workstation, and multi-modal music AI app.
44
 
45
+ Synesthesia brings together every open-weights model from **Magenta Classic** and
46
+ **Magenta RT** under one repo, exportable to ONNX for local inference and continuously
47
+ fine-tunable via free Google Colab notebooks.
48
 
49
+ ---
 
 
 
 
 
 
 
 
 
 
 
50
 
51
+ ## Inference Runtimes
52
 
53
+ | Runtime | Models | Backend | Notes |
54
+ |---------|--------|---------|-------|
55
+ | **Burn wgpu** | DDSP, GANSynth, NSynth, Piano Genie | Vulkan / DX12 | Pure Rust, no ROCm required |
56
+ | **ORT + DirectML** | RNN family, MusicVAE, Coconet, Onsets & Frames | DirectML | Fallback while Burn op coverage matures |
57
+ | **llama.cpp + Vulkan** | Gemma-3N | Vulkan | Same stack as LM Studio, GGUF format |
58
+ | **Magenta RT (JAX)** | Magenta RT LLM, SpectroStream, MusicCoCa | TPU / GPU | Free Colab TPU v2-8 for inference + finetuning |
59
 
60
+ Vulkan works on AMD without ROCm on Windows 11. All runtimes target the RX 6700 XT.
 
 
 
 
61
 
62
+ ---
63
 
64
+ ## Model Inventory
65
+
66
+ ### Magenta RT (Real-Time Audio Generation)
67
+
68
+ Magenta RT is composed of three components working as a pipeline:
69
+ SpectroStream (audio codec), MusicCoCa (style embeddings), and an encoder-decoder
70
+ transformer LLM — the only open-weights model supporting real-time continuous
71
+ musical audio generation.
72
+
73
+ It is an 800 million parameter autoregressive transformer trained on
74
+ ~190k hours of stock music. It uses 38% fewer parameters
75
+ than Stable Audio Open and 77% fewer than MusicGen Large.
76
+
77
+ | ID | Model | Format | Task | Synesthesia Role |
78
+ |----|-------|--------|------|-----------------|
79
+ | MRT-001 | Magenta RT LLM | JAX / ONNX | Real-time stereo audio generation | Continuous live generation engine |
80
+ | MRT-002 | SpectroStream Encoder | ONNX | Audio → discrete tokens (48kHz stereo, 25Hz, 64 RVQ) | Audio tokenizer |
81
+ | MRT-003 | SpectroStream Decoder | ONNX | Tokens → 48kHz stereo audio | Audio detokenizer |
82
+ | MRT-004 | MusicCoCa Text | ONNX | Text → 768-dim music embedding | Text prompt → style control |
83
+ | MRT-005 | MusicCoCa Audio | ONNX | Audio → 768-dim music embedding | Audio prompt → style control |
84
+
85
+ **Finetuning:** Free Colab TPU v2-8 via `Magenta_RT_Finetune.ipynb`. Customize to
86
+ your own audio catalog. Official Colab demos support live generation,
87
+ finetuning, and live audio injection (audio injection = mix user audio with model
88
+ output and feed as context for next generation chunk).
89
+
90
+ ---
91
+
92
+ ### Magenta Classic — MIDI / Symbolic
93
+
94
+ MusicRNN implements Magenta's LSTM-based language models:
95
+ MelodyRNN, DrumsRNN, ImprovRNN, and PerformanceRNN.
96
+
97
+ | ID | Model | Format | Task | Synesthesia Role |
98
+ |----|-------|--------|------|-----------------|
99
+ | MC-001 | Performance RNN | ONNX | Expressive MIDI performance generation | AI arpeggiator, live note generation |
100
+ | MC-002 | Melody RNN | ONNX | Melody continuation (LSTM) | Melody continuation tool |
101
+ | MC-003 | Drums RNN | ONNX | Drum pattern generation (LSTM) | Beat generation |
102
+ | MC-004 | Improv RNN | ONNX | Chord-conditioned melody generation | Live improv over chord progressions |
103
+ | MC-005 | Polyphony RNN | ONNX | Polyphonic music generation (BachBot) | Harmonic voice generation |
104
+ | MC-006 | MusicVAE | ONNX enc+dec | Latent music VAE — melody, drum, trio loops | Latent interpolation, style morphing |
105
+ | MC-007 | GrooVAE | ONNX enc+dec | Drum performance humanization | Humanize MIDI drums |
106
+ | MC-008 | MidiMe | ONNX | Personalize MusicVAE in-session | User-adaptive latent space |
107
+ | MC-009 | Music Transformer | ONNX | Long-form piano generation | Extended composition |
108
+ | MC-010 | Coconet | ONNX | Counterpoint by convolution — complete partial scores | Harmony / counterpoint filler |
109
+
110
+ ---
111
+
112
+ ### Magenta Classic — Audio / Timbre
113
+
114
+ | ID | Model | Format | Task | Synesthesia Role |
115
+ |----|-------|--------|------|-----------------|
116
+ | MA-001 | GANSynth | ONNX | GAN audio synthesis from NSynth timbres | GANHarp-style timbre instrument |
117
+ | MA-002 | NSynth | ONNX | WaveNet neural audio synthesis | Sample-level timbre generation |
118
+ | MA-003 | DDSP Encoder | ONNX | Audio → harmonic + noise params | Timbre analysis |
119
+ | MA-004 | DDSP Decoder | ONNX | Harmonic params → audio | Timbre resynthesis |
120
+ | MA-005 | Piano Genie | ONNX | 8-button → 88-key piano VQ-VAE | Accessible piano performance |
121
+ | MA-006 | Onsets and Frames | ONNX | Polyphonic piano transcription (audio → MIDI) | Audio → MIDI transcription |
122
+ | MA-007 | SPICE | ONNX | Pitch extraction from audio | Monophonic pitch tracking |
123
+
124
+ ---
125
+
126
+ ### LLM / Vision Control
127
+
128
+ | ID | Model | Format | Task | Synesthesia Role |
129
+ |----|-------|--------|------|-----------------|
130
+ | LV-001 | Gemma-3N e2b-it | GGUF | Vision + text → structured JSON | Camera → mood/energy/key control |
131
+
132
+ **Format tiers:**
133
+ - `q4_k_m.gguf` — default (recommended, ~1.5GB)
134
+ - `q2_k.gguf` — lite tier (fastest, smallest)
135
+ - `f16.gguf` — full quality reference
136
+
137
+ **Runtime:** `llama-cpp-v3` Rust crate with Vulkan backend.
138
+ Same stack as LM Studio — no ROCm, no CUDA needed on Windows.
139
+
140
+ ---
141
+
142
+ ## Repository Structure
143
+
144
+ ```
145
+ Ashiedu/Synesthesia/
146
+
147
+ ├── manifest.json ← authoritative model registry
148
+
149
+ ├── magenta_rt/
150
+ │ ├── llm/ ← MRT-001: JAX checkpoint + ONNX export
151
+ │ ├── spectrostream/
152
+ │ │ ├── encoder_fp32.onnx
153
+ │ │ ├── encoder_fp16.onnx
154
+ │ │ ├── decoder_fp32.onnx
155
+ │ │ └── decoder_fp16.onnx
156
+ │ └── musiccoca/
157
+ │ ├── text_fp32.onnx
158
+ │ ├── text_fp16.onnx
159
+ │ ├── audio_fp32.onnx
160
+ │ └── audio_fp16.onnx
161
+
162
+ ├── midi/
163
+ │ ├── perfrnn/ ← MC-001: fp32 / fp16 / int8
164
+ │ ├── melody_rnn/ ← MC-002
165
+ │ ├── drums_rnn/ ← MC-003
166
+ │ ├── improv_rnn/ ← MC-004
167
+ │ ├── polyphony_rnn/ ← MC-005
168
+ │ ├── musicvae/ ← MC-006: encoder + decoder
169
+ │ ├── groovae/ ← MC-007
170
+ │ ├── midime/ ← MC-008
171
+ │ ├── music_transformer/ ← MC-009
172
+ │ └── coconet/ ← MC-010
173
+
174
+ ├── audio/
175
+ │ ├── gansynth/ ← MA-001: fp32 / fp16
176
+ │ ├── nsynth/ ← MA-002
177
+ │ ├── ddsp/ ← MA-003+004: encoder + decoder
178
+ │ ├── piano_genie/ ← MA-005
179
+ │ ├── onsets_and_frames/ ← MA-006
180
+ │ └── spice/ ← MA-007
181
+
182
+ └── llm/
183
+ └── gemma3n_e2b/
184
+ ├── q4_k_m.gguf ← LV-001: default
185
+ ├── q2_k.gguf
186
+ └── f16.gguf
187
  ```
188
 
189
+ Each subdirectory contains a `README.md` with input/output shapes,
190
+ export commands, and Burn compatibility status.
191
+
192
+ ---
193
+
194
+ ## Quality Tiers (ONNX models)
195
+
196
+ | Tier | Suffix | VRAM est. | Use case |
197
+ |------|--------|-----------|----------|
198
+ | Full | `_fp32.onnx` | ~2–4× Half | Reference quality, CI validation |
199
+ | **Half** | `_fp16.onnx` | Baseline | **Default — recommended for RX 6700 XT** |
200
+ | Lite | `_int8.onnx` | ~0.5× Half | Lowest latency (MIDI models only) |
201
+
202
+ ---
203
+
204
+ ## Pulling Models in Rust
205
+
206
  ```rust
 
207
  use hf_hub::api::sync::Api;
208
+
209
+ pub fn pull(repo_path: &str) -> anyhow::Result<std::path::PathBuf> {
210
+ let api = Api::new()?;
211
+ let repo = api.model("Ashiedu/Synesthesia".to_string());
212
+ Ok(repo.get(repo_path)?)
213
+ // Cached: ~/.cache/huggingface/hub/
214
+ }
215
+
216
+ // Example
217
+ let path = pull("midi/perfrnn/fp16.onnx")?;
218
  ```
219
 
220
+ ## Pulling Models in Python
221
 
222
+ ```python
223
+ from huggingface_hub import snapshot_download, hf_hub_download
224
+
225
+ # Pull everything
226
+ snapshot_download("Ashiedu/Synesthesia", local_dir="./models")
227
+
228
+ # Pull one file
229
+ hf_hub_download(
230
+ repo_id="Ashiedu/Synesthesia",
231
+ filename="midi/perfrnn/fp16.onnx",
232
+ local_dir="./models",
233
+ )
234
  ```
235
+
236
+ ---
237
+
238
+ ## Export Workflow (Colab)
239
+
240
+ All models are exported from Colab and pushed here. The generic workflow:
241
+
242
+ ```python
243
+ # 1. Pull existing checkpoint (if updating)
244
+ from huggingface_hub import snapshot_download
245
+ snapshot_download("Ashiedu/Synesthesia", local_dir="./models", token=HF_TOKEN)
246
+
247
+ # 2. Clone Magenta source
248
+ # !git clone https://github.com/magenta/magenta
249
+ # !git clone https://github.com/magenta/magenta-realtime
250
+
251
+ # 3. Export to ONNX (varies per model — see each model's README)
252
+ # Magenta Classic: tf2onnx
253
+ # Magenta RT: JAX → onnx via jax2onnx or flax export
254
+ # Gemma-3N: Unsloth → GGUF
255
+
256
+ # 4. Quantize
257
+ from onnxruntime.quantization import quantize_dynamic, QuantType
258
+ import onnxconverter_common as occ, onnx
259
+
260
+ fp32 = onnx.load("model.onnx")
261
+ fp16 = occ.convert_float_to_float16(fp32, keep_io_types=True)
262
+ onnx.save(fp16, "model_fp16.onnx")
263
+ quantize_dynamic("model.onnx", "model_int8.onnx", weight_type=QuantType.QInt8)
264
+
265
+ # 5. Push to HF
266
+ from huggingface_hub import HfApi
267
+ api = HfApi(token=HF_TOKEN) # set in Colab Secrets
268
+ api.upload_file(
269
+ path_or_fileobj="model_fp16.onnx",
270
+ path_in_repo="midi/perfrnn/fp16.onnx",
271
+ repo_id="Ashiedu/Synesthesia",
272
+ commit_message="MC-001 Performance RNN fp16",
273
+ )
274
  ```
275
 
276
+ **Gemini on Colab:** Point Gemini at this README and the model's subdirectory
277
+ README as context. Gemini can execute the export + push workflow without
278
+ GitHub integration — it only needs Python and your HF token in Colab Secrets.
279
+
280
+ ---
281
+
282
+ ## Burn Compatibility Tracking
283
+
284
+ CI weekly attempts `burn-onnx ModelGen` on each exported model.
285
+ Models migrate from ORT fallback to Burn as op coverage matures.
286
+
287
+ | Model | Burn target | ORT fallback | Last checked |
288
+ |-------|------------|--------------|-------------|
289
+ | DDSP enc/dec | ✅ | ❌ | — |
290
+ | GANSynth | ✅ | ❌ | — |
291
+ | NSynth | ✅ | ❌ | — |
292
+ | Piano Genie | ✅ | ❌ | — |
293
+ | Performance RNN | 🔄 LSTM | ✅ | — |
294
+ | Melody RNN | 🔄 LSTM | ✅ | — |
295
+ | Drums RNN | 🔄 LSTM | ✅ | — |
296
+ | Improv RNN | 🔄 LSTM | ✅ | — |
297
+ | Polyphony RNN | 🔄 LSTM | ✅ | — |
298
+ | MusicVAE | 🔄 BiLSTM | ✅ | — |
299
+ | Coconet | 🔄 Conv | ✅ | — |
300
+ | Music Transformer | 🔄 Attention | ✅ | — |
301
+ | Onsets & Frames | 🔄 Conv+LSTM | ✅ | — |
302
+ | SpectroStream | 🔄 Conv | ✅ | — |
303
+ | MusicCoCa | 🔄 ViT+Transformer | ✅ | — |
304
+ | Gemma-3N | N/A — llama.cpp | ❌ | — |
305
+
306
+ ---
307
+
308
+ ## Training Philosophy
309
+
310
+ **Train after the app works.** The interface ships first. Training data
311
+ is determined by what the working app actually receives as input in practice.
312
+ Fine-tune on your own audio and MIDI once the signal chain is wired.
313
+
314
+ Tentative fine-tuning order once the app is functional:
315
+ 1. Performance RNN — live MIDI from the Track Mixer
316
+ 2. MusicVAE / GrooVAE — latent interpolation between patches
317
+ 3. GANSynth — timbre generation from pitch + latent input
318
+ 4. DDSP — resynthesis of GANSynth outputs
319
+ 5. Magenta RT — full audio, conditioned on your own catalog
320
+ 6. Gemma-3N — camera → mood/energy trained on your session recordings
321
+
322
+ ---
323
+
324
  ## License
325
 
326
+ - Codebase: Apache 2.0
327
+ - Magenta Classic weights: Apache 2.0
328
+ - Magenta RT weights: Apache 2.0 with additional [bespoke terms](https://github.com/magenta/magenta-realtime/blob/main/LICENSE)
329
+ - Gemma-3N: [Gemma Terms of Use](https://ai.google.dev/gemma/terms)
330
+
331
+ Individual model directories note any additional upstream license terms.
332
+
333
+ ---
334
 
335
  ## Links
336
 
337
+ - **App:** [kryptodogg/synesthesia](https://github.com/kryptodogg/synesthesia)
338
+ - **Magenta RT:** [magenta/magenta-realtime](https://github.com/magenta/magenta-realtime)
339
+ - **Magenta Classic:** [magenta/magenta](https://github.com/magenta/magenta)
340
+ - **HF Model Card:** [google/magenta-realtime](https://huggingface.co/google/magenta-realtime)
341
+ - **Roadmap:** GitHub Issues — `lane:ml` label