MIDI Gen AI

A custom 25M-parameter transformer trained from scratch on symbolic MIDI for real-time music continuation. Companion model to the midi-gen-ai source repo, designed to fit inside a live Ableton Live workflow.

Model summary

Architecture GPT-style decoder-only transformer (RoPE, SwiGLU, RMSNorm, FlashAttention-2)
Parameters 25M
Vocabulary 641 (custom event-based via MidiTok MIDILike)
Context 2048 tokens
Training data Lakh + MAESTRO + POP909 + GiantMIDI + LAMD, deduped (~158k MIDIs pilot)
Training infra Lambda Labs H100 (1ร—)
Tokens / note ~4
Inference (CPU) ~278 tok/s โ‰ˆ ~70 notes/s

This is the pilot 25M version. A larger 200M run is planned.

Versions

subfolder model status
v2-pilot/ 25M custom transformer (Lambda-trained) current best โœ…

Future versions will be added as additional subfolders.

Files (per version)

  • ckpt_final.pt โ€” PyTorch checkpoint (state dict + model config)
  • tokenizer.json โ€” saved MidiTok MIDILike tokenizer

Usage

Direct, low-level:

from huggingface_hub import hf_hub_download
# fetch (cached at ~/.cache/huggingface/)
ckpt = hf_hub_download("nicholasbien/midigenai", "ckpt_final.pt", subfolder="v2-pilot")
tok  = hf_hub_download("nicholasbien/midigenai", "tokenizer.json", subfolder="v2-pilot")

from midigenai.v2.generate_v2 import V2Generator
gen = V2Generator(checkpoint_path=ckpt, tokenizer_path=tok)
prompt = gen.encode_midi_file("seed.mid")
gen.generate_to_midi(prompt, "out.mid", max_new_tokens=512, temperature=1.0, top_k=50)

Higher level (via ableton-mcp-pro) for live use inside Ableton Live.

Training

See the v2 plan + scripts in the source repo.

  • Standard next-token CE; AdamW (ฮฒ=0.9, 0.95), wd 0.1; cosine LR with 2k warmup
  • Batch ~500k tokens/step (gradient accumulation)
  • Tempo stripped at training, re-applied at decode (model is tempo-invariant)
  • BOS-prefixed prompts; EOS terminates generation

Intended use

  • Generating melodic / harmonic continuations from short MIDI prompts
  • Inside a DAW for jamming / sketching
  • Research baselines for MIDI-only generative models at this scale

Out of scope:

  • Audio output (this model is symbolic only)
  • Text-conditioned generation
  • Lyrics / vocal generation

Limitations

  • Pilot run; quality scales with parameter count and training data
  • Single-track / piano-style prompts work best (training corpus is piano-heavy)
  • Drum prompts are weakly supported
  • Some non-diatonic drift; consider post-snap to scale for tonal applications

License

MIT.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support