Slay Micro-Models — tiny from-scratch music experts + composition research

A family of ~0.8M-parameter char-level GPTs, each trained from scratch on monophonic music in ABC notation, plus experiments in composing small experts (stitching, ensembling, duets). Built as research for the Slayer collective (toward a "small models, composed" paper). Music is the sandbox; the methods generalize.

Every expert shares one architecture: decoder-only Transformer — 4 layers, 4 heads, d_model=128, context 128 chars, character-level. Trained on CPU in minutes.

The experts

Expert (`data/models/`)	Style / render	Training data	Val perplexity
`jig_ckpt.pt`	Irish jig, 6/8	12.1k tunes (thesession.org)	3.80
`bach_ckpt.pt`	Baroque chorale soprano	350 soprano lines (music21)	2.09*
`waltz_ckpt.pt`	Lyrical waltz, 3/4 → piano	3.0k tunes	~4.4
`reel_ckpt.pt`	Driving fiddle, 4/4 → violin	17.2k tunes	~4.9
`reel_sv_ckpt.pt`	reel on shared vocab (for composition)	17.2k tunes	~4.9

* Bach ppl is not directly comparable (smaller vocab + very repetitive data).

Composition experiments

E0 (self-stitch) ✅ — a trained linear mapper at an intermediate seam is lossless (Δppl ≈ 0): the stitching mechanism is sound (validates plumbing, not the thesis).
Ensemble fusion (src/compose/fuse.py) — blend two experts' next-token distributions (shared vocab) → audible hybrid. This is the flat-weighting baseline.
Duet (src/compose/duet.py) — two experts layered (piano + violin, simultaneous): multi-track, not model-level fusion.
Next — E1: representation-level stitch (the actual hypothesis, meant to beat these baselines).

Pipeline (`src/`)

prepare_data.py / prepare_bach.py (build ABC corpus) → gpt.py (architecture) → train_gpt.py (train; optional shared vocab) → make_midi.py / gen_samples.py (generate + render) → e0_stitch.py / fuse.py / duet.py (composition) · ngram_model.py (baseline) · abc_to_midi.py (render).

Usage

pip install torch music21
python src/generate/gen_samples.py --ckpt data/models/waltz_ckpt.pt --meter 3/4 --keys D,G,Emin --inst piano --out out

Honest scope

Shown: a small char-LM learns real musical structure (meter, key signatures, cadences) from next-token prediction alone; data cleaning measurably helps (ppl 3.88→3.80); the stitch mechanism is lossless; experts can be combined (baseline). Not yet shown: that representation-level composition of small experts beats a single model — the open hypothesis (E1+).

Data & license

Code & weights: MIT. Training data not redistributed — folk tunes from thesession.org (rebuild via prepare_data.py); Bach chorale sopranos via music21. Please respect source terms.

Built by Arkadiusz Słota for the Slayer collective. Educational / research project.

Downloads last month: -; Downloads are not tracked for this model. How to track