--- license: mit tags: - music - midi - music-generation - transformer - pytorch library_name: pytorch pipeline_tag: text-generation --- # MIDI Gen AI A custom 25M-parameter transformer trained from scratch on symbolic MIDI for **real-time music continuation**. Companion model to the [`midi-gen-ai`](https://github.com/nicholasbien/midi-gen-ai) source repo, designed to fit inside a live Ableton Live workflow. ## Model summary | | | |---|---| | Architecture | GPT-style decoder-only transformer (RoPE, SwiGLU, RMSNorm, FlashAttention-2) | | Parameters | 25M | | Vocabulary | 641 (custom event-based via [MidiTok MIDILike](https://github.com/Natooz/MidiTok)) | | Context | 2048 tokens | | Training data | Lakh + MAESTRO + POP909 + GiantMIDI + LAMD, deduped (~158k MIDIs pilot) | | Training infra | Lambda Labs H100 (1×) | | Tokens / note | ~4 | | Inference (CPU) | ~278 tok/s ≈ ~70 notes/s | This is the **pilot 25M** version. A larger 200M run is planned. ## Versions | subfolder | model | status | |---|---|---| | `v2-pilot/` | 25M custom transformer (Lambda-trained) | current best ✅ | Future versions will be added as additional subfolders. ## Files (per version) - `ckpt_final.pt` — PyTorch checkpoint (state dict + model config) - `tokenizer.json` — saved [MidiTok MIDILike](https://github.com/Natooz/MidiTok) tokenizer ## Usage Direct, low-level: ```python from huggingface_hub import hf_hub_download # fetch (cached at ~/.cache/huggingface/) ckpt = hf_hub_download("nicholasbien/midigenai", "ckpt_final.pt", subfolder="v2-pilot") tok = hf_hub_download("nicholasbien/midigenai", "tokenizer.json", subfolder="v2-pilot") from midigenai.v2.generate_v2 import V2Generator gen = V2Generator(checkpoint_path=ckpt, tokenizer_path=tok) prompt = gen.encode_midi_file("seed.mid") gen.generate_to_midi(prompt, "out.mid", max_new_tokens=512, temperature=1.0, top_k=50) ``` Higher level (via [ableton-mcp-pro](https://github.com/nicholasbien/ableton-mcp-pro)) for live use inside Ableton Live. ## Training See the [v2 plan + scripts](https://github.com/nicholasbien/midi-gen-ai) in the source repo. - Standard next-token CE; AdamW (β=0.9, 0.95), wd 0.1; cosine LR with 2k warmup - Batch ~500k tokens/step (gradient accumulation) - Tempo stripped at training, re-applied at decode (model is tempo-invariant) - BOS-prefixed prompts; EOS terminates generation ## Intended use - Generating melodic / harmonic continuations from short MIDI prompts - Inside a DAW for jamming / sketching - Research baselines for MIDI-only generative models at this scale Out of scope: - Audio output (this model is symbolic only) - Text-conditioned generation - Lyrics / vocal generation ## Limitations - Pilot run; quality scales with parameter count and training data - Single-track / piano-style prompts work best (training corpus is piano-heavy) - Drum prompts are weakly supported - Some non-diatonic drift; consider post-snap to scale for tonal applications ## License [MIT](https://github.com/nicholasbien/midi-gen-ai/blob/main/LICENSE).