Synesthesia β€” AI Music Models

ONNX and GGUF model weights for Synesthesia, a cyber-physical synthesizer, 3D/4D signal workstation, and multi-modal music AI app.

Synesthesia brings together every open-weights model from Magenta Classic and Magenta RT under one repo, exportable to ONNX for local inference and continuously fine-tunable via free Google Colab notebooks.


Inference Runtimes

Runtime Models Backend Notes
Burn wgpu DDSP, GANSynth, NSynth, Piano Genie Vulkan / DX12 Pure Rust, no ROCm required
ORT + DirectML RNN family, MusicVAE, Coconet, Onsets & Frames DirectML Fallback while Burn op coverage matures
llama.cpp + Vulkan Gemma-3N Vulkan Same stack as LM Studio, GGUF format
Magenta RT (JAX) Magenta RT LLM, SpectroStream, MusicCoCa TPU / GPU Free Colab TPU v2-8 for inference + finetuning

Vulkan works on AMD without ROCm on Windows 11. All runtimes target the RX 6700 XT.


Model Inventory

Magenta RT (Real-Time Audio Generation)

Magenta RT is composed of three components working as a pipeline: SpectroStream (audio codec), MusicCoCa (style embeddings), and an encoder-decoder transformer LLM β€” the only open-weights model supporting real-time continuous musical audio generation.

It is an 800 million parameter autoregressive transformer trained on ~190k hours of stock music. It uses 38% fewer parameters than Stable Audio Open and 77% fewer than MusicGen Large.

ID Model Format Task Synesthesia Role
MRT-001 Magenta RT LLM JAX / ONNX Real-time stereo audio generation Continuous live generation engine
MRT-002 SpectroStream Encoder ONNX Audio β†’ discrete tokens (48kHz stereo, 25Hz, 64 RVQ) Audio tokenizer
MRT-003 SpectroStream Decoder ONNX Tokens β†’ 48kHz stereo audio Audio detokenizer
MRT-004 MusicCoCa Text ONNX Text β†’ 768-dim music embedding Text prompt β†’ style control
MRT-005 MusicCoCa Audio ONNX Audio β†’ 768-dim music embedding Audio prompt β†’ style control

Finetuning: Free Colab TPU v2-8 via Magenta_RT_Finetune.ipynb. Customize to your own audio catalog. Official Colab demos support live generation, finetuning, and live audio injection (audio injection = mix user audio with model output and feed as context for next generation chunk).


Magenta Classic β€” MIDI / Symbolic

MusicRNN implements Magenta's LSTM-based language models: MelodyRNN, DrumsRNN, ImprovRNN, and PerformanceRNN.

ID Model Format Task Synesthesia Role
MC-001 Performance RNN ONNX Expressive MIDI performance generation AI arpeggiator, live note generation
MC-002 Melody RNN ONNX Melody continuation (LSTM) Melody continuation tool
MC-003 Drums RNN ONNX Drum pattern generation (LSTM) Beat generation
MC-004 Improv RNN ONNX Chord-conditioned melody generation Live improv over chord progressions
MC-005 Polyphony RNN ONNX Polyphonic music generation (BachBot) Harmonic voice generation
MC-006 MusicVAE ONNX enc+dec Latent music VAE β€” melody, drum, trio loops Latent interpolation, style morphing
MC-007 GrooVAE ONNX enc+dec Drum performance humanization Humanize MIDI drums
MC-008 MidiMe ONNX Personalize MusicVAE in-session User-adaptive latent space
MC-009 Music Transformer ONNX Long-form piano generation Extended composition
MC-010 Coconet ONNX Counterpoint by convolution β€” complete partial scores Harmony / counterpoint filler

Magenta Classic β€” Audio / Timbre

ID Model Format Task Synesthesia Role
MA-001 GANSynth ONNX GAN audio synthesis from NSynth timbres GANHarp-style timbre instrument
MA-002 NSynth ONNX WaveNet neural audio synthesis Sample-level timbre generation
MA-003 DDSP Encoder ONNX Audio β†’ harmonic + noise params Timbre analysis
MA-004 DDSP Decoder ONNX Harmonic params β†’ audio Timbre resynthesis
MA-005 Piano Genie ONNX 8-button β†’ 88-key piano VQ-VAE Accessible piano performance
MA-006 Onsets and Frames ONNX Polyphonic piano transcription (audio β†’ MIDI) Audio β†’ MIDI transcription
MA-007 SPICE ONNX Pitch extraction from audio Monophonic pitch tracking

LLM / Vision Control

ID Model Format Task Synesthesia Role
LV-001 Gemma-3N e2b-it GGUF Vision + text β†’ structured JSON Camera β†’ mood/energy/key control

Format tiers:

  • q4_k_m.gguf β€” default (recommended, ~1.5GB)
  • q2_k.gguf β€” lite tier (fastest, smallest)
  • f16.gguf β€” full quality reference

Runtime: llama-cpp-v3 Rust crate with Vulkan backend. Same stack as LM Studio β€” no ROCm, no CUDA needed on Windows.


Repository Structure

Ashiedu/Synesthesia/
β”‚
β”œβ”€β”€ manifest.json                    ← authoritative model registry
β”‚
β”œβ”€β”€ magenta_rt/
β”‚   β”œβ”€β”€ llm/                         ← MRT-001: JAX checkpoint + ONNX export
β”‚   β”œβ”€β”€ spectrostream/
β”‚   β”‚   β”œβ”€β”€ encoder_fp32.onnx
β”‚   β”‚   β”œβ”€β”€ encoder_fp16.onnx
β”‚   β”‚   β”œβ”€β”€ decoder_fp32.onnx
β”‚   β”‚   └── decoder_fp16.onnx
β”‚   └── musiccoca/
β”‚       β”œβ”€β”€ text_fp32.onnx
β”‚       β”œβ”€β”€ text_fp16.onnx
β”‚       β”œβ”€β”€ audio_fp32.onnx
β”‚       └── audio_fp16.onnx
β”‚
β”œβ”€β”€ midi/
β”‚   β”œβ”€β”€ perfrnn/                     ← MC-001: fp32 / fp16 / int8
β”‚   β”œβ”€β”€ melody_rnn/                  ← MC-002
β”‚   β”œβ”€β”€ drums_rnn/                   ← MC-003
β”‚   β”œβ”€β”€ improv_rnn/                  ← MC-004
β”‚   β”œβ”€β”€ polyphony_rnn/               ← MC-005
β”‚   β”œβ”€β”€ musicvae/                    ← MC-006: encoder + decoder
β”‚   β”œβ”€β”€ groovae/                     ← MC-007
β”‚   β”œβ”€β”€ midime/                      ← MC-008
β”‚   β”œβ”€β”€ music_transformer/           ← MC-009
β”‚   └── coconet/                     ← MC-010
β”‚
β”œβ”€β”€ audio/
β”‚   β”œβ”€β”€ gansynth/                    ← MA-001: fp32 / fp16
β”‚   β”œβ”€β”€ nsynth/                      ← MA-002
β”‚   β”œβ”€β”€ ddsp/                        ← MA-003+004: encoder + decoder
β”‚   β”œβ”€β”€ piano_genie/                 ← MA-005
β”‚   β”œβ”€β”€ onsets_and_frames/           ← MA-006
β”‚   └── spice/                       ← MA-007
β”‚
└── llm/
    └── gemma3n_e2b/
        β”œβ”€β”€ q4_k_m.gguf              ← LV-001: default
        β”œβ”€β”€ q2_k.gguf
        └── f16.gguf

Each subdirectory contains a README.md with input/output shapes, export commands, and Burn compatibility status.


Quality Tiers (ONNX models)

Tier Suffix VRAM est. Use case
Full _fp32.onnx ~2–4Γ— Half Reference quality, CI validation
Half _fp16.onnx Baseline Default β€” recommended for RX 6700 XT
Lite _int8.onnx ~0.5Γ— Half Lowest latency (MIDI models only)

Pulling Models in Rust

use hf_hub::api::sync::Api;

pub fn pull(repo_path: &str) -> anyhow::Result<std::path::PathBuf> {
    let api = Api::new()?;
    let repo = api.model("Ashiedu/Synesthesia".to_string());
    Ok(repo.get(repo_path)?)
    // Cached: ~/.cache/huggingface/hub/
}

// Example
let path = pull("midi/perfrnn/fp16.onnx")?;

Pulling Models in Python

from huggingface_hub import snapshot_download, hf_hub_download

# Pull everything
snapshot_download("Ashiedu/Synesthesia", local_dir="./models")

# Pull one file
hf_hub_download(
    repo_id="Ashiedu/Synesthesia",
    filename="midi/perfrnn/fp16.onnx",
    local_dir="./models",
)

Export Workflow (Colab)

All models are exported from Colab and pushed here. The generic workflow:

# 1. Pull existing checkpoint (if updating)
from huggingface_hub import snapshot_download
snapshot_download("Ashiedu/Synesthesia", local_dir="./models", token=HF_TOKEN)

# 2. Clone Magenta source
# !git clone https://github.com/magenta/magenta
# !git clone https://github.com/magenta/magenta-realtime

# 3. Export to ONNX (varies per model β€” see each model's README)
# Magenta Classic: tf2onnx
# Magenta RT: JAX β†’ onnx via jax2onnx or flax export
# Gemma-3N: Unsloth β†’ GGUF

# 4. Quantize
from onnxruntime.quantization import quantize_dynamic, QuantType
import onnxconverter_common as occ, onnx

fp32 = onnx.load("model.onnx")
fp16 = occ.convert_float_to_float16(fp32, keep_io_types=True)
onnx.save(fp16, "model_fp16.onnx")
quantize_dynamic("model.onnx", "model_int8.onnx", weight_type=QuantType.QInt8)

# 5. Push to HF
from huggingface_hub import HfApi
api = HfApi(token=HF_TOKEN)  # set in Colab Secrets
api.upload_file(
    path_or_fileobj="model_fp16.onnx",
    path_in_repo="midi/perfrnn/fp16.onnx",
    repo_id="Ashiedu/Synesthesia",
    commit_message="MC-001 Performance RNN fp16",
)

Gemini on Colab: Point Gemini at this README and the model's subdirectory README as context. Gemini can execute the export + push workflow without GitHub integration β€” it only needs Python and your HF token in Colab Secrets.


Burn Compatibility Tracking

CI weekly attempts burn-onnx ModelGen on each exported model. Models migrate from ORT fallback to Burn as op coverage matures.

Model Burn target ORT fallback Last checked
DDSP enc/dec βœ… ❌ β€”
GANSynth βœ… ❌ β€”
NSynth βœ… ❌ β€”
Piano Genie βœ… ❌ β€”
Performance RNN πŸ”„ LSTM βœ… β€”
Melody RNN πŸ”„ LSTM βœ… β€”
Drums RNN πŸ”„ LSTM βœ… β€”
Improv RNN πŸ”„ LSTM βœ… β€”
Polyphony RNN πŸ”„ LSTM βœ… β€”
MusicVAE πŸ”„ BiLSTM βœ… β€”
Coconet πŸ”„ Conv βœ… β€”
Music Transformer πŸ”„ Attention βœ… β€”
Onsets & Frames πŸ”„ Conv+LSTM βœ… β€”
SpectroStream πŸ”„ Conv βœ… β€”
MusicCoCa πŸ”„ ViT+Transformer βœ… β€”
Gemma-3N N/A β€” llama.cpp ❌ β€”

Training Philosophy

Train after the app works. The interface ships first. Training data is determined by what the working app actually receives as input in practice. Fine-tune on your own audio and MIDI once the signal chain is wired.

Tentative fine-tuning order once the app is functional:

  1. Performance RNN β€” live MIDI from the Track Mixer
  2. MusicVAE / GrooVAE β€” latent interpolation between patches
  3. GANSynth β€” timbre generation from pitch + latent input
  4. DDSP β€” resynthesis of GANSynth outputs
  5. Magenta RT β€” full audio, conditioned on your own catalog
  6. Gemma-3N β€” camera β†’ mood/energy trained on your session recordings

License

Individual model directories note any additional upstream license terms.


Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Ashiedu/Synesthesia

Finetuned
(5)
this model