TheArtist Music Transformer — F4 (Pop 1K Mix, jazz-leaning)

Jazz-adapted chord model with a 1,000-sequence pop rehearsal buffer. The jazz-leaning endpoint of the mix-ratio sweep. Highest jazz top-1 in the collection (81.50%) at the cost of 1.22 pop points.

One of six checkpoints released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). See the collection overview at PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline.

Demo

Watch TheArtist in action on YouTube — interactive staff editor, MIDI input, AI generation with live progress, and per-genre LoRA playback across the 13-genre vocabulary.

Model summary

Field	Value
Architecture	Music Transformer with relative positional attention
Parameters	25,661,440
Vocabulary size	351 tokens
Max sequence length	256
d_model / heads / FFN / layers	512 / 8 / 2048 / 8
Fine-tune resumed from	Phase 0 pop baseline
Best epoch	6

Training data

All 1,513 jazz training sequences plus 1,000 pop rehearsal sequences (seed 42). Pop:jazz ≈ 0.66:1, that is, less pop than jazz in the mix.

Evaluation (held-out per-genre test sets)

Metric	Pop test	Jazz test
Top-1 accuracy	83.02%	81.50%
Top-5 accuracy	96.93%	92.59%
Perplexity	1.81	2.26
Δ vs. Phase 0 baseline	−1.22	+8.64

F4 is the jazz-leaning endpoint of the mix-ratio sweep. It produces the most jazz-flavoured continuations among the released checkpoints, with secondary dominants, tritone substitutions, modal interchange, and II-V chains across distant keys. The cost is roughly one point of pop top-1 accuracy. Qualitative samples (paper §6.4) on a minor ii-V prompt show the bebop-style harmonic motion that this checkpoint commits to more strongly than F3.

Intended use

Recommended for jazz-flavoured chord composition where the user is willing to trade some pop fluency for stronger jazz identity. F3 (ft-pop50) is the balanced alternative; F1 (ft-pop80) is the symmetric pop-leaning endpoint.

Out of scope: melody or audio generation; genres outside pop, rock, and jazz; real-time low-latency settings.

Usage

The repo bundles the project's model.py and tokenizer.py at the repo root, so external users can load the checkpoint end-to-end without cloning anything from GitHub. snapshot_download materializes the full repo on disk; sys.path makes the bundled model.py / tokenizer.py importable.

Required dependencies: torch, huggingface_hub.

import sys
import torch
from huggingface_hub import snapshot_download

# Download the full repo (model.py, tokenizer.py, best.pt, config.json).
ckpt_dir = snapshot_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop29")
sys.path.insert(0, ckpt_dir)  # so the next two imports resolve

from model import MusicTransformer
from tokenizer import ChordTokenizer

tokenizer = ChordTokenizer()
ckpt = torch.load(f"{ckpt_dir}/best.pt", map_location="cpu", weights_only=False)
model = MusicTransformer(
    vocab_size=tokenizer.vocab_size,
    d_model=512, n_heads=8, d_ff=2048, n_layers=8,
    max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

# Prompt = ii-V-I in C major; ask for a jazz-flavoured continuation.
song = {
    "key": "Cmaj", "time_signature": "4/4", "genre": "jazz",
    "bars": [["Dm7", "G7"], ["Cmaj7"]],
}
prompt_ids = tokenizer.encode_sequence(song)[:-1]
ids = torch.tensor([prompt_ids])
with torch.no_grad():
    for _ in range(32):
        logits = model(ids)
        next_id = torch.multinomial(
            torch.softmax(logits[:, -1, :] / 0.8, dim=-1), 1,
        )
        ids = torch.cat([ids, next_id], dim=-1)
        if next_id.item() == tokenizer.eos_id:
            break
print(tokenizer.decode(ids[0].tolist()))

For per-genre adaptation beyond pop and jazz, see the 11 LoRA adapter repos at PearlLeeStudio — they chain on top of this base.

Per-genre real-song eval (held-out 130-song set, 2026-05)

First per-genre evaluation of ft-pop29 beyond the pop/jazz split that the original paper reports.

Eval results

Genre	n_songs	Top-1 (%)	Top-5 (%)	val_loss
pop	10	85.57	95.91	0.5859
rock	10	87.28	97.50	0.4677
jazz	10	71.42	85.54	1.3367
blues	10	81.99	93.86	0.7970
bossa	10	82.62	95.73	0.7226
classical	10	49.65	81.51	2.1079
country	10	86.30	98.18	0.5191
electronic	10	86.81	98.48	0.5100
folk	10	84.81	98.63	0.5337
funk	10	83.39	96.13	0.6959
gospel	10	80.25	96.58	0.7343
hip_hop	10	90.34	98.58	0.3982
rnb_soul	10	85.04	97.04	0.5907

On this eval set F4 peaks on hip_hop (90.34%) and struggles most on classical (49.65%). This is auxiliary signal — the 11 per-genre LoRAs (sister lora-* repos) are the recommended path for production use on the 9 non-pop, non-jazz genres. F-series cells on those genres show what the base model produces under [GENRE:none] conditioning (the model's [GENRE:X] token does not exist for the 9 new genres in the F-series vocab=351).

Eval dataset composition

130 songs total, 10 per genre × 13 genres. Drawn from the same splits/val.jsonl + splits/test.jsonl partitions every F-series model was held out from during training — no train-set leakage. Built by ai/training/build_eval_real_songs.py --seed 42 --per-genre 10 (deterministic).

Genre	n	Source(s)	Bar range	Avg duration · named
pop	10	billboard	58–116	189s · 10/10 named
rock	10	chordonomicon_rock	52–87	127s · 0/10 named
jazz	10	choco:jazz-corpus, choco:real-book, jazzstandards, jht	16–89	72s · 10/10 named
blues	10	chordonomicon_blues	24–46	93s · 0/10 named
bossa	10	chordonomicon_bossa	24–78	88s · 0/10 named
classical	10	chordonomicon_classical	11–40	60s · 10/10 named
country	10	chordonomicon_country	30–81	110s · 0/10 named
electronic	10	chordonomicon_electronic	25–84	89s · 0/10 named
folk	10	chordonomicon_folk	33–82	114s · 0/10 named
funk	10	chordonomicon_funk	30–60	92s · 0/10 named
gospel	10	chordonomicon_gospel	24–85	98s · 0/10 named
hip_hop	10	chordonomicon_hip_hop	24–81	136s · 0/10 named
rnb_soul	10	chordonomicon_rnb_soul	34–82	128s · 0/10 named

Source license summary: McGill Billboard (CC0, named pop songs), Jazz Harmony Treebank / JazzStandards / WJazzD (Public / community-redistributed, named jazz standards), Bach chorales via music21 (public domain, named pieces), Chordonomicon per-genre subsets (CC BY-NC 4.0; titles are Spotify track IDs by upstream dataset policy — progressions are real songs). See docs/EVAL.md for full breakdown.

Methodology

Teacher-forced next-token cross-entropy / top-1 / top-5 over each song's token sequence (BOS + key + time_sig + genre + bars + EOS, truncated to max_seq_len=256). Same evaluate() call as ai/results/f1_per_genre_baseline.csv, just narrowed to the curated 130-song subset. Token-level metrics; not a generation-quality eval (free-generation comparison with R1 Sethares + R2 theory RAG rerank is documented separately in ai/results/eval_report.md).

Caveats:

classical val partition is intrinsically small (37 sequences in full eval); the 10-song subset here has even narrower confidence bands. Directional finding (LoRA helps a lot on Bach harmony) is robust, exact pp deltas are noisy.
F-series numbers on the 9 LoRA-only genres are conditioned without genre tag (vocab=351 has no [GENRE:country] token etc.). This is the realistic "F-series alone" condition, not a controlled ablation.

Source CSV: ai/results/real_song_eval.csv (17 models × 130 songs, long format).

Training-data licenses

Dataset	License
Chordonomicon	Public (user-generated)
McGill Billboard	CC0
Jazz Harmony Treebank	Public
JazzStandards (iReal Pro)	Community redistribution
Weimar Jazz Database	ODbL
JAAH	Research-use public

Citation

Cite the original mix-ratio paper. The companion per-genre LoRA paper (chord-symbol time-series adaptation) is now on arXiv: arXiv:2606.07334.

@misc{lee2026chordmix,
  title         = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2605.04998},
  archivePrefix = {arXiv}
}

@misc{lee2026chordtimeseries,
  title         = {How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity?},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2606.07334},
  archivePrefix = {arXiv}
}

Downloads last month: 214

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for PearlLeeStudio/TheArtist-MusicTransformer-ft-pop29

How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling

Paper • 2606.07334 • Published 4 days ago

Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

Paper • 2605.04998 • Published May 6