Synesthesia — AI Music Models

ONNX and GGUF model weights for Synesthesia, a cyber-physical synthesizer, 3D/4D signal workstation, and multi-modal music AI app.

Synesthesia brings together every open-weights model from Magenta Classic and Magenta RT under one repo, exportable to ONNX for local inference and continuously fine-tunable via free Google Colab notebooks.

Inference Runtimes

Runtime	Models	Backend	Notes
Burn wgpu	DDSP, GANSynth, NSynth, Piano Genie	Vulkan / DX12	Pure Rust, no ROCm required
ORT + DirectML	RNN family, MusicVAE, Coconet, Onsets & Frames	DirectML	Fallback while Burn op coverage matures
llama.cpp + Vulkan	Gemma-3N	Vulkan	Same stack as LM Studio, GGUF format
Magenta RT (JAX)	Magenta RT LLM, SpectroStream, MusicCoCa	TPU / GPU	Free Colab TPU v2-8 for inference + finetuning

Vulkan works on AMD without ROCm on Windows 11. All runtimes target the RX 6700 XT.

Model Inventory

Magenta RT (Real-Time Audio Generation)

Magenta RT is composed of three components working as a pipeline: SpectroStream (audio codec), MusicCoCa (style embeddings), and an encoder-decoder transformer LLM — the only open-weights model supporting real-time continuous musical audio generation.

It is an 800 million parameter autoregressive transformer trained on ~190k hours of stock music. It uses 38% fewer parameters than Stable Audio Open and 77% fewer than MusicGen Large.

ID	Model	Format	Task	Synesthesia Role
MRT-001	Magenta RT LLM	JAX / ONNX	Real-time stereo audio generation	Continuous live generation engine
MRT-002	SpectroStream Encoder	ONNX	Audio → discrete tokens (48kHz stereo, 25Hz, 64 RVQ)	Audio tokenizer
MRT-003	SpectroStream Decoder	ONNX	Tokens → 48kHz stereo audio	Audio detokenizer
MRT-004	MusicCoCa Text	ONNX	Text → 768-dim music embedding	Text prompt → style control
MRT-005	MusicCoCa Audio	ONNX	Audio → 768-dim music embedding	Audio prompt → style control

Finetuning: Free Colab TPU v2-8 via Magenta_RT_Finetune.ipynb. Customize to your own audio catalog. Official Colab demos support live generation, finetuning, and live audio injection (audio injection = mix user audio with model output and feed as context for next generation chunk).

Magenta Classic — MIDI / Symbolic

MusicRNN implements Magenta's LSTM-based language models: MelodyRNN, DrumsRNN, ImprovRNN, and PerformanceRNN.

ID	Model	Format	Task	Synesthesia Role
MC-001	Performance RNN	ONNX	Expressive MIDI performance generation	AI arpeggiator, live note generation
MC-002	Melody RNN	ONNX	Melody continuation (LSTM)	Melody continuation tool
MC-003	Drums RNN	ONNX	Drum pattern generation (LSTM)	Beat generation
MC-004	Improv RNN	ONNX	Chord-conditioned melody generation	Live improv over chord progressions
MC-005	Polyphony RNN	ONNX	Polyphonic music generation (BachBot)	Harmonic voice generation
MC-006	MusicVAE	ONNX enc+dec	Latent music VAE — melody, drum, trio loops	Latent interpolation, style morphing
MC-007	GrooVAE	ONNX enc+dec	Drum performance humanization	Humanize MIDI drums
MC-008	MidiMe	ONNX	Personalize MusicVAE in-session	User-adaptive latent space
MC-009	Music Transformer	ONNX	Long-form piano generation	Extended composition
MC-010	Coconet	ONNX	Counterpoint by convolution — complete partial scores	Harmony / counterpoint filler

Magenta Classic — Audio / Timbre

ID	Model	Format	Task	Synesthesia Role
MA-001	GANSynth	ONNX	GAN audio synthesis from NSynth timbres	GANHarp-style timbre instrument
MA-002	NSynth	ONNX	WaveNet neural audio synthesis	Sample-level timbre generation
MA-003	DDSP Encoder	ONNX	Audio → harmonic + noise params	Timbre analysis
MA-004	DDSP Decoder	ONNX	Harmonic params → audio	Timbre resynthesis
MA-005	Piano Genie	ONNX	8-button → 88-key piano VQ-VAE	Accessible piano performance
MA-006	Onsets and Frames	ONNX	Polyphonic piano transcription (audio → MIDI)	Audio → MIDI transcription
MA-007	SPICE	ONNX	Pitch extraction from audio	Monophonic pitch tracking

LLM / Vision Control

ID	Model	Format	Task	Synesthesia Role
LV-001	Gemma-3N e2b-it	GGUF	Vision + text → structured JSON	Camera → mood/energy/key control

Format tiers:

q4_k_m.gguf — default (recommended, ~1.5GB)
q2_k.gguf — lite tier (fastest, smallest)
f16.gguf — full quality reference

Runtime: llama-cpp-v3 Rust crate with Vulkan backend. Same stack as LM Studio — no ROCm, no CUDA needed on Windows.

Repository Structure

Ashiedu/Synesthesia/
│
├── manifest.json                    ← authoritative model registry
│
├── magenta_rt/
│   ├── llm/                         ← MRT-001: JAX checkpoint + ONNX export
│   ├── spectrostream/
│   │   ├── encoder_fp32.onnx
│   │   ├── encoder_fp16.onnx
│   │   ├── decoder_fp32.onnx
│   │   └── decoder_fp16.onnx
│   └── musiccoca/
│       ├── text_fp32.onnx
│       ├── text_fp16.onnx
│       ├── audio_fp32.onnx
│       └── audio_fp16.onnx
│
├── midi/
│   ├── perfrnn/                     ← MC-001: fp32 / fp16 / int8
│   ├── melody_rnn/                  ← MC-002
│   ├── drums_rnn/                   ← MC-003
│   ├── improv_rnn/                  ← MC-004
│   ├── polyphony_rnn/               ← MC-005
│   ├── musicvae/                    ← MC-006: encoder + decoder
│   ├── groovae/                     ← MC-007
│   ├── midime/                      ← MC-008
│   ├── music_transformer/           ← MC-009
│   └── coconet/                     ← MC-010
│
├── audio/
│   ├── gansynth/                    ← MA-001: fp32 / fp16
│   ├── nsynth/                      ← MA-002
│   ├── ddsp/                        ← MA-003+004: encoder + decoder
│   ├── piano_genie/                 ← MA-005
│   ├── onsets_and_frames/           ← MA-006
│   └── spice/                       ← MA-007
│
└── llm/
    └── gemma3n_e2b/
        ├── q4_k_m.gguf              ← LV-001: default
        ├── q2_k.gguf
        └── f16.gguf

Each subdirectory contains a README.md with input/output shapes, export commands, and Burn compatibility status.

Quality Tiers (ONNX models)

Tier	Suffix	VRAM est.	Use case
Full	`_fp32.onnx`	~2–4× Half	Reference quality, CI validation
Half	`_fp16.onnx`	Baseline	Default — recommended for RX 6700 XT
Lite	`_int8.onnx`	~0.5× Half	Lowest latency (MIDI models only)

Pulling Models in Rust

use hf_hub::api::sync::Api;

pub fn pull(repo_path: &str) -> anyhow::Result<std::path::PathBuf> {
    let api = Api::new()?;
    let repo = api.model("Ashiedu/Synesthesia".to_string());
    Ok(repo.get(repo_path)?)
    // Cached: ~/.cache/huggingface/hub/
}

// Example
let path = pull("midi/perfrnn/fp16.onnx")?;

Pulling Models in Python

from huggingface_hub import snapshot_download, hf_hub_download

# Pull everything
snapshot_download("Ashiedu/Synesthesia", local_dir="./models")

# Pull one file
hf_hub_download(
    repo_id="Ashiedu/Synesthesia",
    filename="midi/perfrnn/fp16.onnx",
    local_dir="./models",
)

Export Workflow (Colab)

All models are exported from Colab and pushed here. The generic workflow:

# 1. Pull existing checkpoint (if updating)
from huggingface_hub import snapshot_download
snapshot_download("Ashiedu/Synesthesia", local_dir="./models", token=HF_TOKEN)

# 2. Clone Magenta source
# !git clone https://github.com/magenta/magenta
# !git clone https://github.com/magenta/magenta-realtime

# 3. Export to ONNX (varies per model — see each model's README)
# Magenta Classic: tf2onnx
# Magenta RT: JAX → onnx via jax2onnx or flax export
# Gemma-3N: Unsloth → GGUF

# 4. Quantize
from onnxruntime.quantization import quantize_dynamic, QuantType
import onnxconverter_common as occ, onnx

fp32 = onnx.load("model.onnx")
fp16 = occ.convert_float_to_float16(fp32, keep_io_types=True)
onnx.save(fp16, "model_fp16.onnx")
quantize_dynamic("model.onnx", "model_int8.onnx", weight_type=QuantType.QInt8)

# 5. Push to HF
from huggingface_hub import HfApi
api = HfApi(token=HF_TOKEN)  # set in Colab Secrets
api.upload_file(
    path_or_fileobj="model_fp16.onnx",
    path_in_repo="midi/perfrnn/fp16.onnx",
    repo_id="Ashiedu/Synesthesia",
    commit_message="MC-001 Performance RNN fp16",
)

Gemini on Colab: Point Gemini at this README and the model's subdirectory README as context. Gemini can execute the export + push workflow without GitHub integration — it only needs Python and your HF token in Colab Secrets.

Burn Compatibility Tracking

CI weekly attempts burn-onnx ModelGen on each exported model. Models migrate from ORT fallback to Burn as op coverage matures.

Model	Burn target	ORT fallback	Last checked
DDSP enc/dec	✅	❌	—
GANSynth	✅	❌	—
NSynth	✅	❌	—
Piano Genie	✅	❌	—
Performance RNN	🔄 LSTM	✅	—
Melody RNN	🔄 LSTM	✅	—
Drums RNN	🔄 LSTM	✅	—
Improv RNN	🔄 LSTM	✅	—
Polyphony RNN	🔄 LSTM	✅	—
MusicVAE	🔄 BiLSTM	✅	—
Coconet	🔄 Conv	✅	—
Music Transformer	🔄 Attention	✅	—
Onsets & Frames	🔄 Conv+LSTM	✅	—
SpectroStream	🔄 Conv	✅	—
MusicCoCa	🔄 ViT+Transformer	✅	—
Gemma-3N	N/A — llama.cpp	❌	—

Training Philosophy

Train after the app works. The interface ships first. Training data is determined by what the working app actually receives as input in practice. Fine-tune on your own audio and MIDI once the signal chain is wired.

Tentative fine-tuning order once the app is functional:

Performance RNN — live MIDI from the Track Mixer
MusicVAE / GrooVAE — latent interpolation between patches
GANSynth — timbre generation from pitch + latent input
DDSP — resynthesis of GANSynth outputs
Magenta RT — full audio, conditioned on your own catalog
Gemma-3N — camera → mood/energy trained on your session recordings

License

Codebase: Apache 2.0
Magenta Classic weights: Apache 2.0
Magenta RT weights: Apache 2.0 with additional bespoke terms
Gemma-3N: Gemma Terms of Use

Individual model directories note any additional upstream license terms.

Model tree for Ashiedu/Synesthesia

Base model

google/magenta-realtime

Finetuned

(5)

this model

Ashiedu
/

Synesthesia