| --- |
| license: apache-2.0 |
| task_categories: |
| - audio-to-audio |
| - text-to-audio |
| - image-to-text |
| tags: |
| - music-generation |
| - magenta |
| - magenta-rt |
| - onnx |
| - burn |
| - llama-cpp |
| - performance-rnn |
| - melody-rnn |
| - drums-rnn |
| - improv-rnn |
| - polyphony-rnn |
| - musicvae |
| - groovae |
| - piano-genie |
| - ddsp |
| - gansynth |
| - nsynth |
| - coconet |
| - music-transformer |
| - onsets-and-frames |
| - spectrostream |
| - musiccoca |
| - synesthesia |
| - directml |
| - vulkan |
| - wgpu |
| - audio |
| - midi |
| language: |
| - en |
| library_name: onnxruntime |
| base_model: |
| - unsloth/gemma-3n-E2B-it |
| - google/magenta-realtime |
| --- |
| |
| # Synesthesia β AI Music Models |
|
|
| ONNX and GGUF model weights for [Synesthesia](https://github.com/kryptodogg/synesthesia), |
| a cyber-physical synthesizer, 3D/4D signal workstation, and multi-modal music AI app. |
|
|
| Synesthesia brings together every open-weights model from **Magenta Classic** and |
| **Magenta RT** under one repo, exportable to ONNX for local inference and continuously |
| fine-tunable via free Google Colab notebooks. |
|
|
| --- |
|
|
| ## Inference Runtimes |
|
|
| | Runtime | Models | Backend | Notes | |
| |---------|--------|---------|-------| |
| | **Burn wgpu** | DDSP, GANSynth, NSynth, Piano Genie | Vulkan / DX12 | Pure Rust, no ROCm required | |
| | **ORT + DirectML** | RNN family, MusicVAE, Coconet, Onsets & Frames | DirectML | Fallback while Burn op coverage matures | |
| | **llama.cpp + Vulkan** | Gemma-3N | Vulkan | Same stack as LM Studio, GGUF format | |
| | **Magenta RT (JAX)** | Magenta RT LLM, SpectroStream, MusicCoCa | TPU / GPU | Free Colab TPU v2-8 for inference + finetuning | |
|
|
| Vulkan works on AMD without ROCm on Windows 11. All runtimes target the RX 6700 XT. |
|
|
| --- |
|
|
| ## Model Inventory |
|
|
| ### Magenta RT (Real-Time Audio Generation) |
|
|
| Magenta RT is composed of three components working as a pipeline: |
| SpectroStream (audio codec), MusicCoCa (style embeddings), and an encoder-decoder |
| transformer LLM β the only open-weights model supporting real-time continuous |
| musical audio generation. |
|
|
| It is an 800 million parameter autoregressive transformer trained on |
| ~190k hours of stock music. It uses 38% fewer parameters |
| than Stable Audio Open and 77% fewer than MusicGen Large. |
|
|
| | ID | Model | Format | Task | Synesthesia Role | |
| |----|-------|--------|------|-----------------| |
| | MRT-001 | Magenta RT LLM | JAX / ONNX | Real-time stereo audio generation | Continuous live generation engine | |
| | MRT-002 | SpectroStream Encoder | ONNX | Audio β discrete tokens (48kHz stereo, 25Hz, 64 RVQ) | Audio tokenizer | |
| | MRT-003 | SpectroStream Decoder | ONNX | Tokens β 48kHz stereo audio | Audio detokenizer | |
| | MRT-004 | MusicCoCa Text | ONNX | Text β 768-dim music embedding | Text prompt β style control | |
| | MRT-005 | MusicCoCa Audio | ONNX | Audio β 768-dim music embedding | Audio prompt β style control | |
|
|
| **Finetuning:** Free Colab TPU v2-8 via `Magenta_RT_Finetune.ipynb`. Customize to |
| your own audio catalog. Official Colab demos support live generation, |
| finetuning, and live audio injection (audio injection = mix user audio with model |
| output and feed as context for next generation chunk). |
|
|
| --- |
|
|
| ### Magenta Classic β MIDI / Symbolic |
|
|
| MusicRNN implements Magenta's LSTM-based language models: |
| MelodyRNN, DrumsRNN, ImprovRNN, and PerformanceRNN. |
|
|
| | ID | Model | Format | Task | Synesthesia Role | |
| |----|-------|--------|------|-----------------| |
| | MC-001 | Performance RNN | ONNX | Expressive MIDI performance generation | AI arpeggiator, live note generation | |
| | MC-002 | Melody RNN | ONNX | Melody continuation (LSTM) | Melody continuation tool | |
| | MC-003 | Drums RNN | ONNX | Drum pattern generation (LSTM) | Beat generation | |
| | MC-004 | Improv RNN | ONNX | Chord-conditioned melody generation | Live improv over chord progressions | |
| | MC-005 | Polyphony RNN | ONNX | Polyphonic music generation (BachBot) | Harmonic voice generation | |
| | MC-006 | MusicVAE | ONNX enc+dec | Latent music VAE β melody, drum, trio loops | Latent interpolation, style morphing | |
| | MC-007 | GrooVAE | ONNX enc+dec | Drum performance humanization | Humanize MIDI drums | |
| | MC-008 | MidiMe | ONNX | Personalize MusicVAE in-session | User-adaptive latent space | |
| | MC-009 | Music Transformer | ONNX | Long-form piano generation | Extended composition | |
| | MC-010 | Coconet | ONNX | Counterpoint by convolution β complete partial scores | Harmony / counterpoint filler | |
|
|
| --- |
|
|
| ### Magenta Classic β Audio / Timbre |
|
|
| | ID | Model | Format | Task | Synesthesia Role | |
| |----|-------|--------|------|-----------------| |
| | MA-001 | GANSynth | ONNX | GAN audio synthesis from NSynth timbres | GANHarp-style timbre instrument | |
| | MA-002 | NSynth | ONNX | WaveNet neural audio synthesis | Sample-level timbre generation | |
| | MA-003 | DDSP Encoder | ONNX | Audio β harmonic + noise params | Timbre analysis | |
| | MA-004 | DDSP Decoder | ONNX | Harmonic params β audio | Timbre resynthesis | |
| | MA-005 | Piano Genie | ONNX | 8-button β 88-key piano VQ-VAE | Accessible piano performance | |
| | MA-006 | Onsets and Frames | ONNX | Polyphonic piano transcription (audio β MIDI) | Audio β MIDI transcription | |
| | MA-007 | SPICE | ONNX | Pitch extraction from audio | Monophonic pitch tracking | |
|
|
| --- |
|
|
| ### LLM / Vision Control |
|
|
| | ID | Model | Format | Task | Synesthesia Role | |
| |----|-------|--------|------|-----------------| |
| | LV-001 | Gemma-3N e2b-it | GGUF | Vision + text β structured JSON | Camera β mood/energy/key control | |
|
|
| **Format tiers:** |
| - `q4_k_m.gguf` β default (recommended, ~1.5GB) |
| - `q2_k.gguf` β lite tier (fastest, smallest) |
| - `f16.gguf` β full quality reference |
|
|
| **Runtime:** `llama-cpp-v3` Rust crate with Vulkan backend. |
| Same stack as LM Studio β no ROCm, no CUDA needed on Windows. |
|
|
| --- |
|
|
| ## Repository Structure |
|
|
| ``` |
| Ashiedu/Synesthesia/ |
| β |
| βββ manifest.json β authoritative model registry |
| β |
| βββ magenta_rt/ |
| β βββ llm/ β MRT-001: JAX checkpoint + ONNX export |
| β βββ spectrostream/ |
| β β βββ encoder_fp32.onnx |
| β β βββ encoder_fp16.onnx |
| β β βββ decoder_fp32.onnx |
| β β βββ decoder_fp16.onnx |
| β βββ musiccoca/ |
| β βββ text_fp32.onnx |
| β βββ text_fp16.onnx |
| β βββ audio_fp32.onnx |
| β βββ audio_fp16.onnx |
| β |
| βββ midi/ |
| β βββ perfrnn/ β MC-001: fp32 / fp16 / int8 |
| β βββ melody_rnn/ β MC-002 |
| β βββ drums_rnn/ β MC-003 |
| β βββ improv_rnn/ β MC-004 |
| β βββ polyphony_rnn/ β MC-005 |
| β βββ musicvae/ β MC-006: encoder + decoder |
| β βββ groovae/ β MC-007 |
| β βββ midime/ β MC-008 |
| β βββ music_transformer/ β MC-009 |
| β βββ coconet/ β MC-010 |
| β |
| βββ audio/ |
| β βββ gansynth/ β MA-001: fp32 / fp16 |
| β βββ nsynth/ β MA-002 |
| β βββ ddsp/ β MA-003+004: encoder + decoder |
| β βββ piano_genie/ β MA-005 |
| β βββ onsets_and_frames/ β MA-006 |
| β βββ spice/ β MA-007 |
| β |
| βββ llm/ |
| βββ gemma3n_e2b/ |
| βββ q4_k_m.gguf β LV-001: default |
| βββ q2_k.gguf |
| βββ f16.gguf |
| ``` |
|
|
| Each subdirectory contains a `README.md` with input/output shapes, |
| export commands, and Burn compatibility status. |
|
|
| --- |
|
|
| ## Quality Tiers (ONNX models) |
|
|
| | Tier | Suffix | VRAM est. | Use case | |
| |------|--------|-----------|----------| |
| | Full | `_fp32.onnx` | ~2β4Γ Half | Reference quality, CI validation | |
| | **Half** | `_fp16.onnx` | Baseline | **Default β recommended for RX 6700 XT** | |
| | Lite | `_int8.onnx` | ~0.5Γ Half | Lowest latency (MIDI models only) | |
|
|
| --- |
|
|
| ## Pulling Models in Rust |
|
|
| ```rust |
| use hf_hub::api::sync::Api; |
| |
| pub fn pull(repo_path: &str) -> anyhow::Result<std::path::PathBuf> { |
| let api = Api::new()?; |
| let repo = api.model("Ashiedu/Synesthesia".to_string()); |
| Ok(repo.get(repo_path)?) |
| // Cached: ~/.cache/huggingface/hub/ |
| } |
| |
| // Example |
| let path = pull("midi/perfrnn/fp16.onnx")?; |
| ``` |
|
|
| ## Pulling Models in Python |
|
|
| ```python |
| from huggingface_hub import snapshot_download, hf_hub_download |
| |
| # Pull everything |
| snapshot_download("Ashiedu/Synesthesia", local_dir="./models") |
| |
| # Pull one file |
| hf_hub_download( |
| repo_id="Ashiedu/Synesthesia", |
| filename="midi/perfrnn/fp16.onnx", |
| local_dir="./models", |
| ) |
| ``` |
|
|
| --- |
|
|
| ## Export Workflow (Colab) |
|
|
| All models are exported from Colab and pushed here. The generic workflow: |
|
|
| ```python |
| # 1. Pull existing checkpoint (if updating) |
| from huggingface_hub import snapshot_download |
| snapshot_download("Ashiedu/Synesthesia", local_dir="./models", token=HF_TOKEN) |
| |
| # 2. Clone Magenta source |
| # !git clone https://github.com/magenta/magenta |
| # !git clone https://github.com/magenta/magenta-realtime |
| |
| # 3. Export to ONNX (varies per model β see each model's README) |
| # Magenta Classic: tf2onnx |
| # Magenta RT: JAX β onnx via jax2onnx or flax export |
| # Gemma-3N: Unsloth β GGUF |
| |
| # 4. Quantize |
| from onnxruntime.quantization import quantize_dynamic, QuantType |
| import onnxconverter_common as occ, onnx |
| |
| fp32 = onnx.load("model.onnx") |
| fp16 = occ.convert_float_to_float16(fp32, keep_io_types=True) |
| onnx.save(fp16, "model_fp16.onnx") |
| quantize_dynamic("model.onnx", "model_int8.onnx", weight_type=QuantType.QInt8) |
| |
| # 5. Push to HF |
| from huggingface_hub import HfApi |
| api = HfApi(token=HF_TOKEN) # set in Colab Secrets |
| api.upload_file( |
| path_or_fileobj="model_fp16.onnx", |
| path_in_repo="midi/perfrnn/fp16.onnx", |
| repo_id="Ashiedu/Synesthesia", |
| commit_message="MC-001 Performance RNN fp16", |
| ) |
| ``` |
|
|
| **Gemini on Colab:** Point Gemini at this README and the model's subdirectory |
| README as context. Gemini can execute the export + push workflow without |
| GitHub integration β it only needs Python and your HF token in Colab Secrets. |
|
|
| --- |
|
|
| ## Burn Compatibility Tracking |
|
|
| CI weekly attempts `burn-onnx ModelGen` on each exported model. |
| Models migrate from ORT fallback to Burn as op coverage matures. |
|
|
| | Model | Burn target | ORT fallback | Last checked | |
| |-------|------------|--------------|-------------| |
| | DDSP enc/dec | β
| β | β | |
| | GANSynth | β
| β | β | |
| | NSynth | β
| β | β | |
| | Piano Genie | β
| β | β | |
| | Performance RNN | π LSTM | β
| β | |
| | Melody RNN | π LSTM | β
| β | |
| | Drums RNN | π LSTM | β
| β | |
| | Improv RNN | π LSTM | β
| β | |
| | Polyphony RNN | π LSTM | β
| β | |
| | MusicVAE | π BiLSTM | β
| β | |
| | Coconet | π Conv | β
| β | |
| | Music Transformer | π Attention | β
| β | |
| | Onsets & Frames | π Conv+LSTM | β
| β | |
| | SpectroStream | π Conv | β
| β | |
| | MusicCoCa | π ViT+Transformer | β
| β | |
| | Gemma-3N | N/A β llama.cpp | β | β | |
|
|
| --- |
|
|
| ## Training Philosophy |
|
|
| **Train after the app works.** The interface ships first. Training data |
| is determined by what the working app actually receives as input in practice. |
| Fine-tune on your own audio and MIDI once the signal chain is wired. |
|
|
| Tentative fine-tuning order once the app is functional: |
| 1. Performance RNN β live MIDI from the Track Mixer |
| 2. MusicVAE / GrooVAE β latent interpolation between patches |
| 3. GANSynth β timbre generation from pitch + latent input |
| 4. DDSP β resynthesis of GANSynth outputs |
| 5. Magenta RT β full audio, conditioned on your own catalog |
| 6. Gemma-3N β camera β mood/energy trained on your session recordings |
|
|
| --- |
|
|
| ## License |
|
|
| - Codebase: Apache 2.0 |
| - Magenta Classic weights: Apache 2.0 |
| - Magenta RT weights: Apache 2.0 with additional [bespoke terms](https://github.com/magenta/magenta-realtime/blob/main/LICENSE) |
| - Gemma-3N: [Gemma Terms of Use](https://ai.google.dev/gemma/terms) |
|
|
| Individual model directories note any additional upstream license terms. |
|
|
| --- |
|
|
| ## Links |
|
|
| - **App:** [kryptodogg/synesthesia](https://github.com/kryptodogg/synesthesia) |
| - **Magenta RT:** [magenta/magenta-realtime](https://github.com/magenta/magenta-realtime) |
| - **Magenta Classic:** [magenta/magenta](https://github.com/magenta/magenta) |
| - **HF Model Card:** [google/magenta-realtime](https://huggingface.co/google/magenta-realtime) |
| - **Roadmap:** GitHub Issues β `lane:ml` label |