Ashiedu
/

Synesthesia

@@ -1,92 +1,341 @@
 ---
 license: apache-2.0
-language:
-  - en
 tags:
   - music-generation
-  - audio
-  - onnx
-  - directml
-  - synesthesia
   - magenta
   - performance-rnn
   - musicvae
   - ddsp
 library_name: onnxruntime
-pipeline_tag: audio-to-audio
 ---
 # Synesthesia — AI Music Models
-ONNX model weights for [Synesthesia](https://github.com/kryptodogg/synesthesia), a cyber-physical synthesizer and 3D/4D signal workstation.
-## Models
-| Model | Source | Format | Size | Task |
-|-------|--------|--------|------|------|
-| Performance RNN | Magenta | ONNX | ~20MB | Note-level MIDI generation |
-| MusicVAE (Encoder) | Magenta | ONNX | ~80MB | Latent music encoding |
-| MusicVAE (Decoder) | Magenta | ONNX | ~80MB | Latent music decoding |
-| DDSP (Encoder) | Magenta | ONNX | ~30MB | Audio → harmonic params |
-| DDSP (Decoder) | Magenta | ONNX | ~30MB | Harmonic params → audio |
-| SpectroStream (Encoder) | Magenta RT | ONNX | TBD | Audio → spectral tokens |
-| SpectroStream (Decoder) | Magenta RT | ONNX | TBD | Spectral tokens → audio |
-| MusicCoCa (Text) | Google | ONNX | TBD | Text → music embedding |
-| MusicCoCa (Audio) | Google | ONNX | TBD | Audio → music embedding |
-| Gemma-3N | Google | ONNX | TBD | Vision → mood/energy JSON |
-## Runtime
-All models run locally via **ONNX Runtime with DirectML** (GPU acceleration on Windows).
-```toml
-# Cargo.toml
-[dependencies]
-ort = { version = "2", features = ["directml"] }
-```
-## Download
-```python
-from huggingface_hub import snapshot_download
-snapshot_download("Ashiedu/Synesthesia", local_dir="./models")
 ```
 ```rust
-// Rust (using hf-hub crate)
 use hf_hub::api::sync::Api;
-let api = Api::new().unwrap();
-let repo = api.model("Ashiedu/Synesthesia".to_string());
-let model_path = repo.get("perfrnn/model.onnx").unwrap();
 ```
-## Structure
 ```
-├── perfrnn/
-│   └── model.onnx
-├── musicvae/
-│   ├── encoder.onnx
-│   └── decoder.onnx
-├── ddsp/
-│   ├── encoder.onnx
-│   └── decoder.onnx
-├── spectrostream/
-│   ├── encoder.onnx
-│   └── decoder.onnx
-├── musiccoca/
-│   ├── text.onnx
-│   └── audio.onnx
-├── gemma3n/
-│   └── model.onnx
-└── manifest.json
 ```
 ## License
-Apache 2.0 — model weights may have additional upstream licenses (see individual model directories).
 ## Links
-- **GitHub:** [kryptodogg/synesthesia](https://github.com/kryptodogg/synesthesia)
-- **Roadmap:** See GitHub Issues with `lane:ml` label

 ---
 license: apache-2.0
+task_categories:
+  - audio-to-audio
+  - text-to-audio
 tags:
   - music-generation
   - magenta
+  - magenta-rt
+  - onnx
+  - burn
+  - llama-cpp
   - performance-rnn
+  - melody-rnn
+  - drums-rnn
+  - improv-rnn
+  - polyphony-rnn
   - musicvae
+  - groovae
+  - piano-genie
   - ddsp
+  - gansynth
+  - nsynth
+  - coconet
+  - music-transformer
+  - onsets-and-frames
+  - spectrostream
+  - musiccoca
+  - synesthesia
+  - directml
+  - vulkan
+  - wgpu
+  - audio
+  - midi
+language:
+  - en
 library_name: onnxruntime
 ---
 # Synesthesia — AI Music Models
+ONNX and GGUF model weights for [Synesthesia](https://github.com/kryptodogg/synesthesia),
+a cyber-physical synthesizer, 3D/4D signal workstation, and multi-modal music AI app.
+Synesthesia brings together every open-weights model from **Magenta Classic** and
+**Magenta RT** under one repo, exportable to ONNX for local inference and continuously
+fine-tunable via free Google Colab notebooks.
+---
+## Inference Runtimes
+| Runtime | Models | Backend | Notes |
+|---------|--------|---------|-------|
+| **Burn wgpu** | DDSP, GANSynth, NSynth, Piano Genie | Vulkan / DX12 | Pure Rust, no ROCm required |
+| **ORT + DirectML** | RNN family, MusicVAE, Coconet, Onsets & Frames | DirectML | Fallback while Burn op coverage matures |
+| **llama.cpp + Vulkan** | Gemma-3N | Vulkan | Same stack as LM Studio, GGUF format |
+| **Magenta RT (JAX)** | Magenta RT LLM, SpectroStream, MusicCoCa | TPU / GPU | Free Colab TPU v2-8 for inference + finetuning |
+Vulkan works on AMD without ROCm on Windows 11. All runtimes target the RX 6700 XT.
+---
+## Model Inventory
+### Magenta RT (Real-Time Audio Generation)
+Magenta RT is composed of three components working as a pipeline:
+SpectroStream (audio codec), MusicCoCa (style embeddings), and an encoder-decoder
+transformer LLM — the only open-weights model supporting real-time continuous
+musical audio generation.
+It is an 800 million parameter autoregressive transformer trained on
+~190k hours of stock music. It uses 38% fewer parameters
+than Stable Audio Open and 77% fewer than MusicGen Large.
+| ID | Model | Format | Task | Synesthesia Role |
+|----|-------|--------|------|-----------------|
+| MRT-001 | Magenta RT LLM | JAX / ONNX | Real-time stereo audio generation | Continuous live generation engine |
+| MRT-002 | SpectroStream Encoder | ONNX | Audio → discrete tokens (48kHz stereo, 25Hz, 64 RVQ) | Audio tokenizer |
+| MRT-003 | SpectroStream Decoder | ONNX | Tokens → 48kHz stereo audio | Audio detokenizer |
+| MRT-004 | MusicCoCa Text | ONNX | Text → 768-dim music embedding | Text prompt → style control |
+| MRT-005 | MusicCoCa Audio | ONNX | Audio → 768-dim music embedding | Audio prompt → style control |
+**Finetuning:** Free Colab TPU v2-8 via `Magenta_RT_Finetune.ipynb`. Customize to
+your own audio catalog. Official Colab demos support live generation,
+finetuning, and live audio injection (audio injection = mix user audio with model
+output and feed as context for next generation chunk).
+---
+### Magenta Classic — MIDI / Symbolic
+MusicRNN implements Magenta's LSTM-based language models:
+MelodyRNN, DrumsRNN, ImprovRNN, and PerformanceRNN.
+| ID | Model | Format | Task | Synesthesia Role |
+|----|-------|--------|------|-----------------|
+| MC-001 | Performance RNN | ONNX | Expressive MIDI performance generation | AI arpeggiator, live note generation |
+| MC-002 | Melody RNN | ONNX | Melody continuation (LSTM) | Melody continuation tool |
+| MC-003 | Drums RNN | ONNX | Drum pattern generation (LSTM) | Beat generation |
+| MC-004 | Improv RNN | ONNX | Chord-conditioned melody generation | Live improv over chord progressions |
+| MC-005 | Polyphony RNN | ONNX | Polyphonic music generation (BachBot) | Harmonic voice generation |
+| MC-006 | MusicVAE | ONNX enc+dec | Latent music VAE — melody, drum, trio loops | Latent interpolation, style morphing |
+| MC-007 | GrooVAE | ONNX enc+dec | Drum performance humanization | Humanize MIDI drums |
+| MC-008 | MidiMe | ONNX | Personalize MusicVAE in-session | User-adaptive latent space |
+| MC-009 | Music Transformer | ONNX | Long-form piano generation | Extended composition |
+| MC-010 | Coconet | ONNX | Counterpoint by convolution — complete partial scores | Harmony / counterpoint filler |
+---
+### Magenta Classic — Audio / Timbre
+| ID | Model | Format | Task | Synesthesia Role |
+|----|-------|--------|------|-----------------|
+| MA-001 | GANSynth | ONNX | GAN audio synthesis from NSynth timbres | GANHarp-style timbre instrument |
+| MA-002 | NSynth | ONNX | WaveNet neural audio synthesis | Sample-level timbre generation |
+| MA-003 | DDSP Encoder | ONNX | Audio → harmonic + noise params | Timbre analysis |
+| MA-004 | DDSP Decoder | ONNX | Harmonic params → audio | Timbre resynthesis |
+| MA-005 | Piano Genie | ONNX | 8-button → 88-key piano VQ-VAE | Accessible piano performance |
+| MA-006 | Onsets and Frames | ONNX | Polyphonic piano transcription (audio → MIDI) | Audio → MIDI transcription |
+| MA-007 | SPICE | ONNX | Pitch extraction from audio | Monophonic pitch tracking |
+---
+### LLM / Vision Control
+| ID | Model | Format | Task | Synesthesia Role |
+|----|-------|--------|------|-----------------|
+| LV-001 | Gemma-3N e2b-it | GGUF | Vision + text → structured JSON | Camera → mood/energy/key control |
+**Format tiers:**
+- `q4_k_m.gguf` — default (recommended, ~1.5GB)
+- `q2_k.gguf` — lite tier (fastest, smallest)
+- `f16.gguf` — full quality reference
+**Runtime:** `llama-cpp-v3` Rust crate with Vulkan backend.
+Same stack as LM Studio — no ROCm, no CUDA needed on Windows.
+---
+## Repository Structure
+```
+Ashiedu/Synesthesia/
+│
+├── manifest.json                    ← authoritative model registry
+│
+├── magenta_rt/
+│   ├── llm/                         ← MRT-001: JAX checkpoint + ONNX export
+│   ├── spectrostream/
+│   │   ├── encoder_fp32.onnx
+│   │   ├── encoder_fp16.onnx
+│   │   ├── decoder_fp32.onnx
+│   │   └── decoder_fp16.onnx
+│   └── musiccoca/
+│       ├── text_fp32.onnx
+│       ├── text_fp16.onnx
+│       ├── audio_fp32.onnx
+│       └── audio_fp16.onnx
+│
+├── midi/
+│   ├── perfrnn/                     ← MC-001: fp32 / fp16 / int8
+│   ├── melody_rnn/                  ← MC-002
+│   ├── drums_rnn/                   ← MC-003
+│   ├── improv_rnn/                  ← MC-004
+│   ├── polyphony_rnn/               ← MC-005
+│   ├── musicvae/                    ← MC-006: encoder + decoder
+│   ├── groovae/                     ← MC-007
+│   ├── midime/                      ← MC-008
+│   ├── music_transformer/           ← MC-009
+│   └── coconet/                     ← MC-010
+│
+├── audio/
+│   ├── gansynth/                    ← MA-001: fp32 / fp16
+│   ├── nsynth/                      ← MA-002
+│   ├── ddsp/                        ← MA-003+004: encoder + decoder
+│   ├── piano_genie/                 ← MA-005
+│   ├── onsets_and_frames/           ← MA-006
+│   └── spice/                       ← MA-007
+│
+└── llm/
+    └── gemma3n_e2b/
+        ├── q4_k_m.gguf              ← LV-001: default
+        ├── q2_k.gguf
+        └── f16.gguf
 ```
+Each subdirectory contains a `README.md` with input/output shapes,
+export commands, and Burn compatibility status.
+---
+## Quality Tiers (ONNX models)
+| Tier | Suffix | VRAM est. | Use case |
+|------|--------|-----------|----------|
+| Full | `_fp32.onnx` | ~2–4× Half | Reference quality, CI validation |
+| **Half** | `_fp16.onnx` | Baseline | **Default — recommended for RX 6700 XT** |
+| Lite | `_int8.onnx` | ~0.5× Half | Lowest latency (MIDI models only) |
+---
+## Pulling Models in Rust
 ```rust
 use hf_hub::api::sync::Api;
+pub fn pull(repo_path: &str) -> anyhow::Result<std::path::PathBuf> {
+    let api = Api::new()?;
+    let repo = api.model("Ashiedu/Synesthesia".to_string());
+    Ok(repo.get(repo_path)?)
+    // Cached: ~/.cache/huggingface/hub/
+}
+// Example
+let path = pull("midi/perfrnn/fp16.onnx")?;
 ```
+## Pulling Models in Python
+```python
+from huggingface_hub import snapshot_download, hf_hub_download
+# Pull everything
+snapshot_download("Ashiedu/Synesthesia", local_dir="./models")
+# Pull one file
+hf_hub_download(
+    repo_id="Ashiedu/Synesthesia",
+    filename="midi/perfrnn/fp16.onnx",
+    local_dir="./models",
+)
 ```
+---
+## Export Workflow (Colab)
+All models are exported from Colab and pushed here. The generic workflow:
+```python
+# 1. Pull existing checkpoint (if updating)
+from huggingface_hub import snapshot_download
+snapshot_download("Ashiedu/Synesthesia", local_dir="./models", token=HF_TOKEN)
+# 2. Clone Magenta source
+# !git clone https://github.com/magenta/magenta
+# !git clone https://github.com/magenta/magenta-realtime
+# 3. Export to ONNX (varies per model — see each model's README)
+# Magenta Classic: tf2onnx
+# Magenta RT: JAX → onnx via jax2onnx or flax export
+# Gemma-3N: Unsloth → GGUF
+# 4. Quantize
+from onnxruntime.quantization import quantize_dynamic, QuantType
+import onnxconverter_common as occ, onnx
+fp32 = onnx.load("model.onnx")
+fp16 = occ.convert_float_to_float16(fp32, keep_io_types=True)
+onnx.save(fp16, "model_fp16.onnx")
+quantize_dynamic("model.onnx", "model_int8.onnx", weight_type=QuantType.QInt8)
+# 5. Push to HF
+from huggingface_hub import HfApi
+api = HfApi(token=HF_TOKEN)  # set in Colab Secrets
+api.upload_file(
+    path_or_fileobj="model_fp16.onnx",
+    path_in_repo="midi/perfrnn/fp16.onnx",
+    repo_id="Ashiedu/Synesthesia",
+    commit_message="MC-001 Performance RNN fp16",
+)
 ```
+**Gemini on Colab:** Point Gemini at this README and the model's subdirectory
+README as context. Gemini can execute the export + push workflow without
+GitHub integration — it only needs Python and your HF token in Colab Secrets.
+---
+## Burn Compatibility Tracking
+CI weekly attempts `burn-onnx ModelGen` on each exported model.
+Models migrate from ORT fallback to Burn as op coverage matures.
+| Model | Burn target | ORT fallback | Last checked |
+|-------|------------|--------------|-------------|
+| DDSP enc/dec | ✅ | ❌ | — |
+| GANSynth | ✅ | ❌ | — |
+| NSynth | ✅ | ❌ | — |
+| Piano Genie | ✅ | ❌ | — |
+| Performance RNN | 🔄 LSTM | ✅ | — |
+| Melody RNN | 🔄 LSTM | ✅ | — |
+| Drums RNN | 🔄 LSTM | ✅ | — |
+| Improv RNN | 🔄 LSTM | ✅ | — |
+| Polyphony RNN | 🔄 LSTM | ✅ | — |
+| MusicVAE | 🔄 BiLSTM | ✅ | — |
+| Coconet | 🔄 Conv | ✅ | — |
+| Music Transformer | 🔄 Attention | ✅ | — |
+| Onsets & Frames | 🔄 Conv+LSTM | ✅ | — |
+| SpectroStream | 🔄 Conv | ✅ | — |
+| MusicCoCa | 🔄 ViT+Transformer | ✅ | — |
+| Gemma-3N | N/A — llama.cpp | ❌ | — |
+---
+## Training Philosophy
+**Train after the app works.** The interface ships first. Training data
+is determined by what the working app actually receives as input in practice.
+Fine-tune on your own audio and MIDI once the signal chain is wired.
+Tentative fine-tuning order once the app is functional:
+1. Performance RNN — live MIDI from the Track Mixer
+2. MusicVAE / GrooVAE — latent interpolation between patches
+3. GANSynth — timbre generation from pitch + latent input
+4. DDSP — resynthesis of GANSynth outputs
+5. Magenta RT — full audio, conditioned on your own catalog
+6. Gemma-3N — camera → mood/energy trained on your session recordings
+---
 ## License
+- Codebase: Apache 2.0
+- Magenta Classic weights: Apache 2.0
+- Magenta RT weights: Apache 2.0 with additional [bespoke terms](https://github.com/magenta/magenta-realtime/blob/main/LICENSE)
+- Gemma-3N: [Gemma Terms of Use](https://ai.google.dev/gemma/terms)
+Individual model directories note any additional upstream license terms.
+---
 ## Links
+- **App:** [kryptodogg/synesthesia](https://github.com/kryptodogg/synesthesia)
+- **Magenta RT:** [magenta/magenta-realtime](https://github.com/magenta/magenta-realtime)
+- **Magenta Classic:** [magenta/magenta](https://github.com/magenta/magenta)
+- **HF Model Card:** [google/magenta-realtime](https://huggingface.co/google/magenta-realtime)
+- **Roadmap:** GitHub Issues — `lane:ml` label