---
license: mit
tags:
- music
- midi
- music-generation
- transformer
- pytorch
library_name: pytorch
pipeline_tag: text-generation
---

# MIDI Gen AI

A custom 25M-parameter transformer trained from scratch on symbolic MIDI for **real-time music continuation**. Companion model to the [`midi-gen-ai`](https://github.com/nicholasbien/midi-gen-ai) source repo, designed to fit inside a live Ableton Live workflow.

## Model summary

| | |
|---|---|
| Architecture | GPT-style decoder-only transformer (RoPE, SwiGLU, RMSNorm, FlashAttention-2) |
| Parameters | 25M |
| Vocabulary | 641 (custom event-based via [MidiTok MIDILike](https://github.com/Natooz/MidiTok)) |
| Context | 2048 tokens |
| Training data | Lakh + MAESTRO + POP909 + GiantMIDI + LAMD, deduped (~158k MIDIs pilot) |
| Training infra | Lambda Labs H100 (1×) |
| Tokens / note | ~4 |
| Inference (CPU) | ~278 tok/s ≈ ~70 notes/s |

This is the **pilot 25M** version. A larger 200M run is planned.

## Versions

| subfolder | model | status |
|---|---|---|
| `v2-pilot/` | 25M custom transformer (Lambda-trained) | current best ✅ |

Future versions will be added as additional subfolders.

## Files (per version)

- `ckpt_final.pt` — PyTorch checkpoint (state dict + model config)
- `tokenizer.json` — saved [MidiTok MIDILike](https://github.com/Natooz/MidiTok) tokenizer

## Usage

Direct, low-level:

```python
from huggingface_hub import hf_hub_download
# fetch (cached at ~/.cache/huggingface/)
ckpt = hf_hub_download("nicholasbien/midigenai", "ckpt_final.pt", subfolder="v2-pilot")
tok  = hf_hub_download("nicholasbien/midigenai", "tokenizer.json", subfolder="v2-pilot")

from midigenai.v2.generate_v2 import V2Generator
gen = V2Generator(checkpoint_path=ckpt, tokenizer_path=tok)
prompt = gen.encode_midi_file("seed.mid")
gen.generate_to_midi(prompt, "out.mid", max_new_tokens=512, temperature=1.0, top_k=50)
```

Higher level (via [ableton-mcp-pro](https://github.com/nicholasbien/ableton-mcp-pro)) for live use inside Ableton Live.

## Training

See the [v2 plan + scripts](https://github.com/nicholasbien/midi-gen-ai) in the source repo.

- Standard next-token CE; AdamW (β=0.9, 0.95), wd 0.1; cosine LR with 2k warmup
- Batch ~500k tokens/step (gradient accumulation)
- Tempo stripped at training, re-applied at decode (model is tempo-invariant)
- BOS-prefixed prompts; EOS terminates generation

## Intended use

- Generating melodic / harmonic continuations from short MIDI prompts
- Inside a DAW for jamming / sketching
- Research baselines for MIDI-only generative models at this scale

Out of scope:
- Audio output (this model is symbolic only)
- Text-conditioned generation
- Lyrics / vocal generation

## Limitations

- Pilot run; quality scales with parameter count and training data
- Single-track / piano-style prompts work best (training corpus is piano-heavy)
- Drum prompts are weakly supported
- Some non-diatonic drift; consider post-snap to scale for tonal applications

## License

[MIT](https://github.com/nicholasbien/midi-gen-ai/blob/main/LICENSE).