MIDI Gen AI
A custom 25M-parameter transformer trained from scratch on symbolic MIDI for real-time music continuation. Companion model to the midi-gen-ai source repo, designed to fit inside a live Ableton Live workflow.
Model summary
| Architecture | GPT-style decoder-only transformer (RoPE, SwiGLU, RMSNorm, FlashAttention-2) |
| Parameters | 25M |
| Vocabulary | 641 (custom event-based via MidiTok MIDILike) |
| Context | 2048 tokens |
| Training data | Lakh + MAESTRO + POP909 + GiantMIDI + LAMD, deduped (~158k MIDIs pilot) |
| Training infra | Lambda Labs H100 (1ร) |
| Tokens / note | ~4 |
| Inference (CPU) | ~278 tok/s โ ~70 notes/s |
This is the pilot 25M version. A larger 200M run is planned.
Versions
| subfolder | model | status |
|---|---|---|
v2-pilot/ |
25M custom transformer (Lambda-trained) | current best โ |
Future versions will be added as additional subfolders.
Files (per version)
ckpt_final.ptโ PyTorch checkpoint (state dict + model config)tokenizer.jsonโ saved MidiTok MIDILike tokenizer
Usage
Direct, low-level:
from huggingface_hub import hf_hub_download
# fetch (cached at ~/.cache/huggingface/)
ckpt = hf_hub_download("nicholasbien/midigenai", "ckpt_final.pt", subfolder="v2-pilot")
tok = hf_hub_download("nicholasbien/midigenai", "tokenizer.json", subfolder="v2-pilot")
from midigenai.v2.generate_v2 import V2Generator
gen = V2Generator(checkpoint_path=ckpt, tokenizer_path=tok)
prompt = gen.encode_midi_file("seed.mid")
gen.generate_to_midi(prompt, "out.mid", max_new_tokens=512, temperature=1.0, top_k=50)
Higher level (via ableton-mcp-pro) for live use inside Ableton Live.
Training
See the v2 plan + scripts in the source repo.
- Standard next-token CE; AdamW (ฮฒ=0.9, 0.95), wd 0.1; cosine LR with 2k warmup
- Batch ~500k tokens/step (gradient accumulation)
- Tempo stripped at training, re-applied at decode (model is tempo-invariant)
- BOS-prefixed prompts; EOS terminates generation
Intended use
- Generating melodic / harmonic continuations from short MIDI prompts
- Inside a DAW for jamming / sketching
- Research baselines for MIDI-only generative models at this scale
Out of scope:
- Audio output (this model is symbolic only)
- Text-conditioned generation
- Lyrics / vocal generation
Limitations
- Pilot run; quality scales with parameter count and training data
- Single-track / piano-style prompts work best (training corpus is piano-heavy)
- Drum prompts are weakly supported
- Some non-diatonic drift; consider post-snap to scale for tonal applications
License
MIT.