MIDI Gen AI

A custom 25M-parameter transformer trained from scratch on symbolic MIDI for real-time music continuation. Companion model to the midi-gen-ai source repo, designed to fit inside a live Ableton Live workflow.

Model summary


Architecture	GPT-style decoder-only transformer (RoPE, SwiGLU, RMSNorm, FlashAttention-2)
Parameters	25M
Vocabulary	641 (custom event-based via MidiTok MIDILike)
Context	2048 tokens
Training data	Lakh + MAESTRO + POP909 + GiantMIDI + LAMD, deduped (~158k MIDIs pilot)
Training infra	Lambda Labs H100 (1×)
Tokens / note	~4
Inference (CPU)	~278 tok/s ≈ ~70 notes/s

This is the pilot 25M version. A larger 200M run is planned.

Versions

subfolder	model	status
`v2-pilot/`	25M custom transformer (Lambda-trained)	current best ✅

Future versions will be added as additional subfolders.

Files (per version)

ckpt_final.pt — PyTorch checkpoint (state dict + model config)
tokenizer.json — saved MidiTok MIDILike tokenizer

Usage

Direct, low-level:

from huggingface_hub import hf_hub_download
# fetch (cached at ~/.cache/huggingface/)
ckpt = hf_hub_download("nicholasbien/midigenai", "ckpt_final.pt", subfolder="v2-pilot")
tok  = hf_hub_download("nicholasbien/midigenai", "tokenizer.json", subfolder="v2-pilot")

from midigenai.v2.generate_v2 import V2Generator
gen = V2Generator(checkpoint_path=ckpt, tokenizer_path=tok)
prompt = gen.encode_midi_file("seed.mid")
gen.generate_to_midi(prompt, "out.mid", max_new_tokens=512, temperature=1.0, top_k=50)

Higher level (via ableton-mcp-pro) for live use inside Ableton Live.

Training

See the v2 plan + scripts in the source repo.

Standard next-token CE; AdamW (β=0.9, 0.95), wd 0.1; cosine LR with 2k warmup
Batch ~500k tokens/step (gradient accumulation)
Tempo stripped at training, re-applied at decode (model is tempo-invariant)
BOS-prefixed prompts; EOS terminates generation

Intended use

Generating melodic / harmonic continuations from short MIDI prompts
Inside a DAW for jamming / sketching
Research baselines for MIDI-only generative models at this scale

Out of scope:

Audio output (this model is symbolic only)
Text-conditioned generation
Lyrics / vocal generation

Limitations

Pilot run; quality scales with parameter count and training data
Single-track / piano-style prompts work best (training corpus is piano-heavy)
Drum prompts are weakly supported
Some non-diatonic drift; consider post-snap to scale for tonal applications

License

MIT.

Downloads last month: -; Downloads are not tracked for this model. How to track