Synesthesia β AI Music Models
ONNX and GGUF model weights for Synesthesia, a cyber-physical synthesizer, 3D/4D signal workstation, and multi-modal music AI app.
Synesthesia brings together every open-weights model from Magenta Classic and Magenta RT under one repo, exportable to ONNX for local inference and continuously fine-tunable via free Google Colab notebooks.
Inference Runtimes
| Runtime | Models | Backend | Notes |
|---|---|---|---|
| Burn wgpu | DDSP, GANSynth, NSynth, Piano Genie | Vulkan / DX12 | Pure Rust, no ROCm required |
| ORT + DirectML | RNN family, MusicVAE, Coconet, Onsets & Frames | DirectML | Fallback while Burn op coverage matures |
| llama.cpp + Vulkan | Gemma-3N | Vulkan | Same stack as LM Studio, GGUF format |
| Magenta RT (JAX) | Magenta RT LLM, SpectroStream, MusicCoCa | TPU / GPU | Free Colab TPU v2-8 for inference + finetuning |
Vulkan works on AMD without ROCm on Windows 11. All runtimes target the RX 6700 XT.
Model Inventory
Magenta RT (Real-Time Audio Generation)
Magenta RT is composed of three components working as a pipeline: SpectroStream (audio codec), MusicCoCa (style embeddings), and an encoder-decoder transformer LLM β the only open-weights model supporting real-time continuous musical audio generation.
It is an 800 million parameter autoregressive transformer trained on ~190k hours of stock music. It uses 38% fewer parameters than Stable Audio Open and 77% fewer than MusicGen Large.
| ID | Model | Format | Task | Synesthesia Role |
|---|---|---|---|---|
| MRT-001 | Magenta RT LLM | JAX / ONNX | Real-time stereo audio generation | Continuous live generation engine |
| MRT-002 | SpectroStream Encoder | ONNX | Audio β discrete tokens (48kHz stereo, 25Hz, 64 RVQ) | Audio tokenizer |
| MRT-003 | SpectroStream Decoder | ONNX | Tokens β 48kHz stereo audio | Audio detokenizer |
| MRT-004 | MusicCoCa Text | ONNX | Text β 768-dim music embedding | Text prompt β style control |
| MRT-005 | MusicCoCa Audio | ONNX | Audio β 768-dim music embedding | Audio prompt β style control |
Finetuning: Free Colab TPU v2-8 via Magenta_RT_Finetune.ipynb. Customize to
your own audio catalog. Official Colab demos support live generation,
finetuning, and live audio injection (audio injection = mix user audio with model
output and feed as context for next generation chunk).
Magenta Classic β MIDI / Symbolic
MusicRNN implements Magenta's LSTM-based language models: MelodyRNN, DrumsRNN, ImprovRNN, and PerformanceRNN.
| ID | Model | Format | Task | Synesthesia Role |
|---|---|---|---|---|
| MC-001 | Performance RNN | ONNX | Expressive MIDI performance generation | AI arpeggiator, live note generation |
| MC-002 | Melody RNN | ONNX | Melody continuation (LSTM) | Melody continuation tool |
| MC-003 | Drums RNN | ONNX | Drum pattern generation (LSTM) | Beat generation |
| MC-004 | Improv RNN | ONNX | Chord-conditioned melody generation | Live improv over chord progressions |
| MC-005 | Polyphony RNN | ONNX | Polyphonic music generation (BachBot) | Harmonic voice generation |
| MC-006 | MusicVAE | ONNX enc+dec | Latent music VAE β melody, drum, trio loops | Latent interpolation, style morphing |
| MC-007 | GrooVAE | ONNX enc+dec | Drum performance humanization | Humanize MIDI drums |
| MC-008 | MidiMe | ONNX | Personalize MusicVAE in-session | User-adaptive latent space |
| MC-009 | Music Transformer | ONNX | Long-form piano generation | Extended composition |
| MC-010 | Coconet | ONNX | Counterpoint by convolution β complete partial scores | Harmony / counterpoint filler |
Magenta Classic β Audio / Timbre
| ID | Model | Format | Task | Synesthesia Role |
|---|---|---|---|---|
| MA-001 | GANSynth | ONNX | GAN audio synthesis from NSynth timbres | GANHarp-style timbre instrument |
| MA-002 | NSynth | ONNX | WaveNet neural audio synthesis | Sample-level timbre generation |
| MA-003 | DDSP Encoder | ONNX | Audio β harmonic + noise params | Timbre analysis |
| MA-004 | DDSP Decoder | ONNX | Harmonic params β audio | Timbre resynthesis |
| MA-005 | Piano Genie | ONNX | 8-button β 88-key piano VQ-VAE | Accessible piano performance |
| MA-006 | Onsets and Frames | ONNX | Polyphonic piano transcription (audio β MIDI) | Audio β MIDI transcription |
| MA-007 | SPICE | ONNX | Pitch extraction from audio | Monophonic pitch tracking |
LLM / Vision Control
| ID | Model | Format | Task | Synesthesia Role |
|---|---|---|---|---|
| LV-001 | Gemma-3N e2b-it | GGUF | Vision + text β structured JSON | Camera β mood/energy/key control |
Format tiers:
q4_k_m.ggufβ default (recommended, ~1.5GB)q2_k.ggufβ lite tier (fastest, smallest)f16.ggufβ full quality reference
Runtime: llama-cpp-v3 Rust crate with Vulkan backend.
Same stack as LM Studio β no ROCm, no CUDA needed on Windows.
Repository Structure
Ashiedu/Synesthesia/
β
βββ manifest.json β authoritative model registry
β
βββ magenta_rt/
β βββ llm/ β MRT-001: JAX checkpoint + ONNX export
β βββ spectrostream/
β β βββ encoder_fp32.onnx
β β βββ encoder_fp16.onnx
β β βββ decoder_fp32.onnx
β β βββ decoder_fp16.onnx
β βββ musiccoca/
β βββ text_fp32.onnx
β βββ text_fp16.onnx
β βββ audio_fp32.onnx
β βββ audio_fp16.onnx
β
βββ midi/
β βββ perfrnn/ β MC-001: fp32 / fp16 / int8
β βββ melody_rnn/ β MC-002
β βββ drums_rnn/ β MC-003
β βββ improv_rnn/ β MC-004
β βββ polyphony_rnn/ β MC-005
β βββ musicvae/ β MC-006: encoder + decoder
β βββ groovae/ β MC-007
β βββ midime/ β MC-008
β βββ music_transformer/ β MC-009
β βββ coconet/ β MC-010
β
βββ audio/
β βββ gansynth/ β MA-001: fp32 / fp16
β βββ nsynth/ β MA-002
β βββ ddsp/ β MA-003+004: encoder + decoder
β βββ piano_genie/ β MA-005
β βββ onsets_and_frames/ β MA-006
β βββ spice/ β MA-007
β
βββ llm/
βββ gemma3n_e2b/
βββ q4_k_m.gguf β LV-001: default
βββ q2_k.gguf
βββ f16.gguf
Each subdirectory contains a README.md with input/output shapes,
export commands, and Burn compatibility status.
Quality Tiers (ONNX models)
| Tier | Suffix | VRAM est. | Use case |
|---|---|---|---|
| Full | _fp32.onnx |
~2β4Γ Half | Reference quality, CI validation |
| Half | _fp16.onnx |
Baseline | Default β recommended for RX 6700 XT |
| Lite | _int8.onnx |
~0.5Γ Half | Lowest latency (MIDI models only) |
Pulling Models in Rust
use hf_hub::api::sync::Api;
pub fn pull(repo_path: &str) -> anyhow::Result<std::path::PathBuf> {
let api = Api::new()?;
let repo = api.model("Ashiedu/Synesthesia".to_string());
Ok(repo.get(repo_path)?)
// Cached: ~/.cache/huggingface/hub/
}
// Example
let path = pull("midi/perfrnn/fp16.onnx")?;
Pulling Models in Python
from huggingface_hub import snapshot_download, hf_hub_download
# Pull everything
snapshot_download("Ashiedu/Synesthesia", local_dir="./models")
# Pull one file
hf_hub_download(
repo_id="Ashiedu/Synesthesia",
filename="midi/perfrnn/fp16.onnx",
local_dir="./models",
)
Export Workflow (Colab)
All models are exported from Colab and pushed here. The generic workflow:
# 1. Pull existing checkpoint (if updating)
from huggingface_hub import snapshot_download
snapshot_download("Ashiedu/Synesthesia", local_dir="./models", token=HF_TOKEN)
# 2. Clone Magenta source
# !git clone https://github.com/magenta/magenta
# !git clone https://github.com/magenta/magenta-realtime
# 3. Export to ONNX (varies per model β see each model's README)
# Magenta Classic: tf2onnx
# Magenta RT: JAX β onnx via jax2onnx or flax export
# Gemma-3N: Unsloth β GGUF
# 4. Quantize
from onnxruntime.quantization import quantize_dynamic, QuantType
import onnxconverter_common as occ, onnx
fp32 = onnx.load("model.onnx")
fp16 = occ.convert_float_to_float16(fp32, keep_io_types=True)
onnx.save(fp16, "model_fp16.onnx")
quantize_dynamic("model.onnx", "model_int8.onnx", weight_type=QuantType.QInt8)
# 5. Push to HF
from huggingface_hub import HfApi
api = HfApi(token=HF_TOKEN) # set in Colab Secrets
api.upload_file(
path_or_fileobj="model_fp16.onnx",
path_in_repo="midi/perfrnn/fp16.onnx",
repo_id="Ashiedu/Synesthesia",
commit_message="MC-001 Performance RNN fp16",
)
Gemini on Colab: Point Gemini at this README and the model's subdirectory README as context. Gemini can execute the export + push workflow without GitHub integration β it only needs Python and your HF token in Colab Secrets.
Burn Compatibility Tracking
CI weekly attempts burn-onnx ModelGen on each exported model.
Models migrate from ORT fallback to Burn as op coverage matures.
| Model | Burn target | ORT fallback | Last checked |
|---|---|---|---|
| DDSP enc/dec | β | β | β |
| GANSynth | β | β | β |
| NSynth | β | β | β |
| Piano Genie | β | β | β |
| Performance RNN | π LSTM | β | β |
| Melody RNN | π LSTM | β | β |
| Drums RNN | π LSTM | β | β |
| Improv RNN | π LSTM | β | β |
| Polyphony RNN | π LSTM | β | β |
| MusicVAE | π BiLSTM | β | β |
| Coconet | π Conv | β | β |
| Music Transformer | π Attention | β | β |
| Onsets & Frames | π Conv+LSTM | β | β |
| SpectroStream | π Conv | β | β |
| MusicCoCa | π ViT+Transformer | β | β |
| Gemma-3N | N/A β llama.cpp | β | β |
Training Philosophy
Train after the app works. The interface ships first. Training data is determined by what the working app actually receives as input in practice. Fine-tune on your own audio and MIDI once the signal chain is wired.
Tentative fine-tuning order once the app is functional:
- Performance RNN β live MIDI from the Track Mixer
- MusicVAE / GrooVAE β latent interpolation between patches
- GANSynth β timbre generation from pitch + latent input
- DDSP β resynthesis of GANSynth outputs
- Magenta RT β full audio, conditioned on your own catalog
- Gemma-3N β camera β mood/energy trained on your session recordings
License
- Codebase: Apache 2.0
- Magenta Classic weights: Apache 2.0
- Magenta RT weights: Apache 2.0 with additional bespoke terms
- Gemma-3N: Gemma Terms of Use
Individual model directories note any additional upstream license terms.
Links
- App: kryptodogg/synesthesia
- Magenta RT: magenta/magenta-realtime
- Magenta Classic: magenta/magenta
- HF Model Card: google/magenta-realtime
- Roadmap: GitHub Issues β
lane:mllabel
Model tree for Ashiedu/Synesthesia
Base model
google/magenta-realtime