Slay Micro-Models — tiny from-scratch music experts + composition research

A family of ~0.8M-parameter char-level GPTs, each trained from scratch on monophonic music in ABC notation, plus experiments in composing small experts (stitching, ensembling, duets). Built as research for the Slayer collective (toward a "small models, composed" paper). Music is the sandbox; the methods generalize.

Every expert shares one architecture: decoder-only Transformer — 4 layers, 4 heads, d_model=128, context 128 chars, character-level. Trained on CPU in minutes.

The experts

Expert (data/models/) Style / render Training data Val perplexity
jig_ckpt.pt Irish jig, 6/8 12.1k tunes (thesession.org) 3.80
bach_ckpt.pt Baroque chorale soprano 350 soprano lines (music21) 2.09*
waltz_ckpt.pt Lyrical waltz, 3/4 → piano 3.0k tunes ~4.4
reel_ckpt.pt Driving fiddle, 4/4 → violin 17.2k tunes ~4.9
reel_sv_ckpt.pt reel on shared vocab (for composition) 17.2k tunes ~4.9

* Bach ppl is not directly comparable (smaller vocab + very repetitive data).

Composition experiments

  • E0 (self-stitch) ✅ — a trained linear mapper at an intermediate seam is lossless (Δppl ≈ 0): the stitching mechanism is sound (validates plumbing, not the thesis).
  • Ensemble fusion (src/compose/fuse.py) — blend two experts' next-token distributions (shared vocab) → audible hybrid. This is the flat-weighting baseline.
  • Duet (src/compose/duet.py) — two experts layered (piano + violin, simultaneous): multi-track, not model-level fusion.
  • Next — E1: representation-level stitch (the actual hypothesis, meant to beat these baselines).

Pipeline (src/)

prepare_data.py / prepare_bach.py (build ABC corpus) → gpt.py (architecture) → train_gpt.py (train; optional shared vocab) → make_midi.py / gen_samples.py (generate + render) → e0_stitch.py / fuse.py / duet.py (composition) · ngram_model.py (baseline) · abc_to_midi.py (render).

Usage

pip install torch music21
python src/generate/gen_samples.py --ckpt data/models/waltz_ckpt.pt --meter 3/4 --keys D,G,Emin --inst piano --out out

Honest scope

Shown: a small char-LM learns real musical structure (meter, key signatures, cadences) from next-token prediction alone; data cleaning measurably helps (ppl 3.88→3.80); the stitch mechanism is lossless; experts can be combined (baseline). Not yet shown: that representation-level composition of small experts beats a single model — the open hypothesis (E1+).

Data & license

Code & weights: MIT. Training data not redistributed — folk tunes from thesession.org (rebuild via prepare_data.py); Bach chorale sopranos via music21. Please respect source terms.

Built by Arkadiusz Słota for the Slayer collective. Educational / research project.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support