- TheArtist Music Transformer — F1 (Pop 10K Mix, pop-leaning)
TheArtist Music Transformer — F1 (Pop 10K Mix, pop-leaning)
Jazz-adapted chord model with a 10,000-sequence pop rehearsal buffer. The pop-leaning endpoint of the mix-ratio sweep. Pop accuracy actually improves on the pre-fine-tune baseline; jazz reaches +8.17 points.
One of six checkpoints released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). See the paper for full context and the collection overview at PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline.
Demo
Watch TheArtist in action on YouTube — interactive staff editor, MIDI input, AI generation with live progress, and per-genre LoRA playback across the 13-genre vocabulary.
Model summary
| Field | Value |
|---|---|
| Architecture | Music Transformer with relative positional attention |
| Parameters | 25,661,440 |
| Vocabulary size | 351 tokens |
| Max sequence length | 256 |
| d_model / heads / FFN / layers | 512 / 8 / 2048 / 8 |
| Fine-tune resumed from | Phase 0 pop baseline |
| Best epoch | 6 |
Training data
All 1,513 jazz training sequences (Jazz Harmony Treebank, JazzStandards, Weimar Jazz Database, JAAH) plus 10,000 pop rehearsal sequences sub-sampled with seed 42 from the Phase 0 pop training split. Pop:jazz ≈ 6.6:1 in the mix.
Fine-tune hyperparameters: peak learning rate 2 × 10⁻⁵, two-epoch warmup, ten epochs maximum with patience 5.
Evaluation (held-out per-genre test sets)
| Metric | Pop test | Jazz test |
|---|---|---|
| Top-1 accuracy | 84.60% | 81.03% |
| Top-5 accuracy | 96.96% | 92.41% |
| Perplexity | 1.78 | 2.31 |
| Δ vs. Phase 0 baseline | +0.36 | +8.17 |
This is the only run in the sweep whose pop top-1 exceeds the Phase 0 baseline. It is also the run with the most stable pop curve over training. Choose F1 when pop fluency is a hard constraint and jazz coloration is welcome but not the primary target. Generations stay rooted in commercial pop and rock harmony, with jazz substitutions appearing selectively (an occasional secondary dominant or ii-V detour inside an otherwise diatonic loop).
Out-of-distribution per-genre baseline (added 2026-05-11)
F1 alone (no LoRA), measured on each of the 11 Chordonomicon genre val splits that the per-genre LoRA adapters target. New genre tokens are resized into the embedding matrix at [GENRE:none] (i.e. unconditioned), since the F1 base never saw them during training. This is the no-LoRA reference reported alongside every lora-<genre> adapter card.
| Genre | Val seq. | F1 top-1 (%) | F1 top-5 (%) | F1 val loss |
|---|---|---|---|---|
| hip-hop | 1,402 | 86.51 | 96.27 | 0.6240 |
| electronic | 1,519 | 84.50 | 95.93 | 0.6835 |
| rock | 4,891 | 82.79 | 96.75 | 0.5865 |
| folk | 6,075 | 82.66 | 95.80 | 0.7406 |
| funk | 283 | 82.54 | 94.38 | 0.7878 |
| country | 6,173 | 82.45 | 96.22 | 0.7402 |
| rnb/soul | 955 | 82.09 | 94.12 | 0.8119 |
| blues | 994 | 81.70 | 94.80 | 0.8137 |
| gospel | 374 | 79.34 | 94.73 | 0.8813 |
| bossa | 1,431 | 78.33 | 93.64 | 0.9635 |
| classical | 37 | 43.54 | 72.82 | 2.8653 |
F1 extrapolates reasonably to rock, hip-hop, country, folk, and other commercial-adjacent genres without ever having seen their genre tokens. It struggles on classical (functional tonality outside the pop/jazz training distribution) and to a lesser extent on bossa and gospel. The matching lora-<genre> adapters lift each of these numbers; see the LoRA adapter cards under the same PearlLeeStudio/ namespace for the LoRA-vs-base Δ tables. Source: ai/results/f1_per_genre_baseline.csv.
Intended use and limitations
Recommended for chord-composition workflows targeting pop, rock, CCM, K-pop, J-pop, and modern country with optional jazz coloration. F4 (ft-pop29) is the symmetric jazz-leaning endpoint; F3 (ft-pop50) is the balanced middle.
Out of scope: melody or audio generation; genres outside pop, rock, and jazz; real-time low-latency settings.
Usage
The repo bundles the project's model.py and tokenizer.py at the repo
root, so external users can load the checkpoint end-to-end without
cloning anything from GitHub. snapshot_download materializes the full
repo on disk; sys.path makes the bundled model.py / tokenizer.py
importable.
Required dependencies: torch, huggingface_hub.
import sys
import torch
from huggingface_hub import snapshot_download
# Download the full repo (model.py, tokenizer.py, best.pt, config.json).
ckpt_dir = snapshot_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80")
sys.path.insert(0, ckpt_dir) # so the next two imports resolve
from model import MusicTransformer
from tokenizer import ChordTokenizer
tokenizer = ChordTokenizer()
ckpt = torch.load(f"{ckpt_dir}/best.pt", map_location="cpu", weights_only=False)
model = MusicTransformer(
vocab_size=tokenizer.vocab_size,
d_model=512, n_heads=8, d_ff=2048, n_layers=8,
max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
# Prompt = ii-V-I in C major; ask for a pop-flavoured continuation.
song = {
"key": "Cmaj", "time_signature": "4/4", "genre": "pop",
"bars": [["Dm7", "G7"], ["Cmaj7"]],
}
prompt_ids = tokenizer.encode_sequence(song)[:-1]
ids = torch.tensor([prompt_ids])
with torch.no_grad():
for _ in range(32):
logits = model(ids)
next_id = torch.multinomial(
torch.softmax(logits[:, -1, :] / 0.8, dim=-1), 1,
)
ids = torch.cat([ids, next_id], dim=-1)
if next_id.item() == tokenizer.eos_id:
break
print(tokenizer.decode(ids[0].tolist()))
For per-genre adaptation beyond pop and jazz, see the 11 LoRA adapter repos at PearlLeeStudio — they chain on top of this base.
Per-genre real-song eval (held-out 130-song set, 2026-05)
First per-genre evaluation of ft-pop80 beyond the pop/jazz split that the original paper reports.
Eval results
| Genre | n_songs | Top-1 (%) | Top-5 (%) | val_loss |
|---|---|---|---|---|
| pop | 10 | 86.68 | 96.01 | 0.5734 |
| rock | 10 | 86.69 | 97.48 | 0.4578 |
| jazz | 10 | 64.96 | 81.16 | 1.8958 |
| blues | 10 | 81.52 | 93.91 | 0.8410 |
| bossa | 10 | 81.43 | 95.47 | 0.7825 |
| classical | 10 | 49.55 | 81.17 | 2.2389 |
| country | 10 | 85.89 | 98.44 | 0.5152 |
| electronic | 10 | 87.39 | 98.45 | 0.5072 |
| folk | 10 | 85.04 | 98.92 | 0.5244 |
| funk | 10 | 83.85 | 96.03 | 0.6811 |
| gospel | 10 | 79.79 | 96.85 | 0.7367 |
| hip_hop | 10 | 90.66 | 98.59 | 0.3957 |
| rnb_soul | 10 | 85.11 | 97.07 | 0.5877 |
On this eval set F1 peaks on hip_hop (90.66%) and struggles most on classical (49.55%).
This is auxiliary signal — the 11 per-genre LoRAs (sister lora-* repos) are the recommended path for production use on the 9 non-pop, non-jazz genres. F-series cells on those genres show what the base model produces under [GENRE:none] conditioning (the model's [GENRE:X] token does not exist for the 9 new genres in the F-series vocab=351).
Eval dataset composition
130 songs total, 10 per genre × 13 genres. Drawn from the same splits/val.jsonl + splits/test.jsonl partitions every F-series model was held out from during training — no train-set leakage. Built by ai/training/build_eval_real_songs.py --seed 42 --per-genre 10 (deterministic).
| Genre | n | Source(s) | Bar range | Avg duration · named |
|---|---|---|---|---|
| pop | 10 | billboard | 58–116 | 189s · 10/10 named |
| rock | 10 | chordonomicon_rock | 52–87 | 127s · 0/10 named |
| jazz | 10 | choco:jazz-corpus, choco:real-book, jazzstandards, jht | 16–89 | 72s · 10/10 named |
| blues | 10 | chordonomicon_blues | 24–46 | 93s · 0/10 named |
| bossa | 10 | chordonomicon_bossa | 24–78 | 88s · 0/10 named |
| classical | 10 | chordonomicon_classical | 11–40 | 60s · 10/10 named |
| country | 10 | chordonomicon_country | 30–81 | 110s · 0/10 named |
| electronic | 10 | chordonomicon_electronic | 25–84 | 89s · 0/10 named |
| folk | 10 | chordonomicon_folk | 33–82 | 114s · 0/10 named |
| funk | 10 | chordonomicon_funk | 30–60 | 92s · 0/10 named |
| gospel | 10 | chordonomicon_gospel | 24–85 | 98s · 0/10 named |
| hip_hop | 10 | chordonomicon_hip_hop | 24–81 | 136s · 0/10 named |
| rnb_soul | 10 | chordonomicon_rnb_soul | 34–82 | 128s · 0/10 named |
Source license summary: McGill Billboard (CC0, named pop songs), Jazz Harmony Treebank / JazzStandards / WJazzD (Public / community-redistributed, named jazz standards), Bach chorales via music21 (public domain, named pieces), Chordonomicon per-genre subsets (CC BY-NC 4.0; titles are Spotify track IDs by upstream dataset policy — progressions are real songs). See docs/EVAL.md for full breakdown.
Methodology
Teacher-forced next-token cross-entropy / top-1 / top-5 over each song's token sequence (BOS + key + time_sig + genre + bars + EOS, truncated to max_seq_len=256). Same evaluate() call as ai/results/f1_per_genre_baseline.csv, just narrowed to the curated 130-song subset. Token-level metrics; not a generation-quality eval (free-generation comparison with R1 Sethares + R2 theory RAG rerank is documented separately in ai/results/eval_report.md).
Caveats:
classicalval partition is intrinsically small (37 sequences in full eval); the 10-song subset here has even narrower confidence bands. Directional finding (LoRA helps a lot on Bach harmony) is robust, exact pp deltas are noisy.- F-series numbers on the 9 LoRA-only genres are conditioned without genre tag (vocab=351 has no
[GENRE:country]token etc.). This is the realistic "F-series alone" condition, not a controlled ablation.
Source CSV: ai/results/real_song_eval.csv (17 models × 130 songs, long format).
Training-data licenses
| Dataset | License |
|---|---|
| Chordonomicon | Public (user-generated) |
| McGill Billboard | CC0 |
| Jazz Harmony Treebank | Public |
| JazzStandards (iReal Pro) | Community redistribution |
| Weimar Jazz Database | ODbL |
| JAAH | Research-use public |
Citation
Cite the original mix-ratio paper. The companion per-genre LoRA paper (chord-symbol time-series adaptation) is in preparation; its arXiv ID will be added here once posted.
@misc{lee2026chordmix,
title = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
author = {Lee, Jinju},
year = {2026},
eprint = {2605.04998},
archivePrefix = {arXiv}
}
@misc{lee2026chordtimeseries,
title = {How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity?},
author = {Lee, Jinju},
year = {2026},
note = {arXiv preprint, ID TBD},
}
- Downloads last month
- 297