| --- |
| license: mit |
| tags: |
| - music |
| - midi |
| - music-generation |
| - transformer |
| - pytorch |
| library_name: pytorch |
| pipeline_tag: text-generation |
| --- |
| |
| # MIDI Gen AI |
|
|
| A custom 25M-parameter transformer trained from scratch on symbolic MIDI for **real-time music continuation**. Companion model to the [`midi-gen-ai`](https://github.com/nicholasbien/midi-gen-ai) source repo, designed to fit inside a live Ableton Live workflow. |
|
|
| ## Model summary |
|
|
| | | | |
| |---|---| |
| | Architecture | GPT-style decoder-only transformer (RoPE, SwiGLU, RMSNorm, FlashAttention-2) | |
| | Parameters | 25M | |
| | Vocabulary | 641 (custom event-based via [MidiTok MIDILike](https://github.com/Natooz/MidiTok)) | |
| | Context | 2048 tokens | |
| | Training data | Lakh + MAESTRO + POP909 + GiantMIDI + LAMD, deduped (~158k MIDIs pilot) | |
| | Training infra | Lambda Labs H100 (1×) | |
| | Tokens / note | ~4 | |
| | Inference (CPU) | ~278 tok/s ≈ ~70 notes/s | |
|
|
| This is the **pilot 25M** version. A larger 200M run is planned. |
|
|
| ## Versions |
|
|
| | subfolder | model | status | |
| |---|---|---| |
| | `v2-pilot/` | 25M custom transformer (Lambda-trained) | current best ✅ | |
|
|
| Future versions will be added as additional subfolders. |
|
|
| ## Files (per version) |
|
|
| - `ckpt_final.pt` — PyTorch checkpoint (state dict + model config) |
| - `tokenizer.json` — saved [MidiTok MIDILike](https://github.com/Natooz/MidiTok) tokenizer |
|
|
| ## Usage |
|
|
| Direct, low-level: |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| # fetch (cached at ~/.cache/huggingface/) |
| ckpt = hf_hub_download("nicholasbien/midigenai", "ckpt_final.pt", subfolder="v2-pilot") |
| tok = hf_hub_download("nicholasbien/midigenai", "tokenizer.json", subfolder="v2-pilot") |
| |
| from midigenai.v2.generate_v2 import V2Generator |
| gen = V2Generator(checkpoint_path=ckpt, tokenizer_path=tok) |
| prompt = gen.encode_midi_file("seed.mid") |
| gen.generate_to_midi(prompt, "out.mid", max_new_tokens=512, temperature=1.0, top_k=50) |
| ``` |
|
|
| Higher level (via [ableton-mcp-pro](https://github.com/nicholasbien/ableton-mcp-pro)) for live use inside Ableton Live. |
|
|
| ## Training |
|
|
| See the [v2 plan + scripts](https://github.com/nicholasbien/midi-gen-ai) in the source repo. |
|
|
| - Standard next-token CE; AdamW (β=0.9, 0.95), wd 0.1; cosine LR with 2k warmup |
| - Batch ~500k tokens/step (gradient accumulation) |
| - Tempo stripped at training, re-applied at decode (model is tempo-invariant) |
| - BOS-prefixed prompts; EOS terminates generation |
|
|
| ## Intended use |
|
|
| - Generating melodic / harmonic continuations from short MIDI prompts |
| - Inside a DAW for jamming / sketching |
| - Research baselines for MIDI-only generative models at this scale |
|
|
| Out of scope: |
| - Audio output (this model is symbolic only) |
| - Text-conditioned generation |
| - Lyrics / vocal generation |
|
|
| ## Limitations |
|
|
| - Pilot run; quality scales with parameter count and training data |
| - Single-track / piano-style prompts work best (training corpus is piano-heavy) |
| - Drum prompts are weakly supported |
| - Some non-diatonic drift; consider post-snap to scale for tonal applications |
|
|
| ## License |
|
|
| [MIT](https://github.com/nicholasbien/midi-gen-ai/blob/main/LICENSE). |
|
|