nicholasbien
/

midigenai

Text Generation

music-generation

Model card Files Files and versions

midigenai / README.md

nicholasbien's picture

Add model card

7252169 verified 24 days ago

|

history blame contribute delete

3.05 kB

	---
	license: mit
	tags:
	- music
	- midi
	- music-generation
	- transformer
	- pytorch
	library_name: pytorch
	pipeline_tag: text-generation
	---

	# MIDI Gen AI

	A custom 25M-parameter transformer trained from scratch on symbolic MIDI for real-time music continuation. Companion model to the [`midi-gen-ai`](https://github.com/nicholasbien/midi-gen-ai) source repo, designed to fit inside a live Ableton Live workflow.

	## Model summary

	\| \| \|
	\|---\|---\|
	\| Architecture \| GPT-style decoder-only transformer (RoPE, SwiGLU, RMSNorm, FlashAttention-2) \|
	\| Parameters \| 25M \|
	\| Vocabulary \| 641 (custom event-based via [MidiTok MIDILike](https://github.com/Natooz/MidiTok)) \|
	\| Context \| 2048 tokens \|
	\| Training data \| Lakh + MAESTRO + POP909 + GiantMIDI + LAMD, deduped (~158k MIDIs pilot) \|
	\| Training infra \| Lambda Labs H100 (1×) \|
	\| Tokens / note \| ~4 \|
	\| Inference (CPU) \| ~278 tok/s ≈ ~70 notes/s \|

	This is the pilot 25M version. A larger 200M run is planned.

	## Versions

	\| subfolder \| model \| status \|
	\|---\|---\|---\|
	\| `v2-pilot/` \| 25M custom transformer (Lambda-trained) \| current best ✅ \|

	Future versions will be added as additional subfolders.

	## Files (per version)

	- `ckpt_final.pt` — PyTorch checkpoint (state dict + model config)
	- `tokenizer.json` — saved [MidiTok MIDILike](https://github.com/Natooz/MidiTok) tokenizer

	## Usage

	Direct, low-level:

	```python
	from huggingface_hub import hf_hub_download
	# fetch (cached at ~/.cache/huggingface/)
	ckpt = hf_hub_download("nicholasbien/midigenai", "ckpt_final.pt", subfolder="v2-pilot")
	tok = hf_hub_download("nicholasbien/midigenai", "tokenizer.json", subfolder="v2-pilot")

	from midigenai.v2.generate_v2 import V2Generator
	gen = V2Generator(checkpoint_path=ckpt, tokenizer_path=tok)
	prompt = gen.encode_midi_file("seed.mid")
	gen.generate_to_midi(prompt, "out.mid", max_new_tokens=512, temperature=1.0, top_k=50)
	```

	Higher level (via [ableton-mcp-pro](https://github.com/nicholasbien/ableton-mcp-pro)) for live use inside Ableton Live.

	## Training

	See the [v2 plan + scripts](https://github.com/nicholasbien/midi-gen-ai) in the source repo.

	- Standard next-token CE; AdamW (β=0.9, 0.95), wd 0.1; cosine LR with 2k warmup
	- Batch ~500k tokens/step (gradient accumulation)
	- Tempo stripped at training, re-applied at decode (model is tempo-invariant)
	- BOS-prefixed prompts; EOS terminates generation

	## Intended use

	- Generating melodic / harmonic continuations from short MIDI prompts
	- Inside a DAW for jamming / sketching
	- Research baselines for MIDI-only generative models at this scale

	Out of scope:
	- Audio output (this model is symbolic only)
	- Text-conditioned generation
	- Lyrics / vocal generation

	## Limitations

	- Pilot run; quality scales with parameter count and training data
	- Single-track / piano-style prompts work best (training corpus is piano-heavy)
	- Drum prompts are weakly supported
	- Some non-diatonic drift; consider post-snap to scale for tonal applications

	## License

	[MIT](https://github.com/nicholasbien/midi-gen-ai/blob/main/LICENSE).