Instructions to use PearlLeeStudio/TheArtist-MusicTransformer-lora-funk with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use PearlLeeStudio/TheArtist-MusicTransformer-lora-funk with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
TheArtist Music Transformer — LoRA Adapter (Funk)
LoRA adapter that conditions the F1 base (PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80) toward funk chord progressions.
This model is part of the research presented in the paper How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling.
- Code: GitHub - PearlLeeStudio/TheArtist
- Paper: arXiv:2606.07334
One of eleven per-genre adapters released alongside the paper. This release is the best-rank snapshot from a 5-point rank sweep (r ∈ {4, 8, 16, 32, 64}); see §Rank sweep below for the full table and selection criterion.
Demo
Watch TheArtist in action on YouTube — interactive staff editor, MIDI input, AI generation with live progress, and per-genre LoRA playback across the 13-genre vocabulary.
Adapter summary
| Field | Value |
|---|---|
| Base model | PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80 (F1, 25.6M params) |
| Adapter type | LoRA (Q/K/V projections) |
| LoRA rank | 8 |
| LoRA alpha | 16 |
| LoRA dropout | 0.05 |
| Target modules | w_q, w_k, w_v |
| Trainable parameters | |
| Adapter file size | ~0.8 MB |
| Base vocabulary | 351 tokens (jazz/pop) |
| Vocabulary extension | +8 genre tokens (embedding_extension.pt) |
| Training epochs | 8 |
Training data
Source
2,837 chord-progression sequences in the funk subset of the Chordonomicon dataset. Chordonomicon is licensed CC BY-NC 4.0; see the dataset card for full terms.
Filter rule
genres contains any of {funk}
(See ai/training/extract_genre_subsets.py:GENRE_FILTERS for the full extraction logic — main matches the main_genre column, genres_any substring-matches the free-form genres column. Each song is assigned to its first matching genre so it never double-counts.)
Splits (song-level, seed=42, 80/10/10)
| Partition | Songs | Used for |
|---|---|---|
| train | 2,269 | this LoRA's training (12-key augmented → 27,228 sequences) |
| val | 283 | rank-sweep eval + best-epoch selection during training |
| test | 285 | held aside for future paired analysis |
Vocabulary
- Base: 351 tokens (jazz/pop chord vocab from the F1 base model)
- Extension: +8
[GENRE:X]tokens covering 8 new genres (this LoRA adds the[GENRE:funk]token) - Final vocab: 359 tokens (stored alongside the adapter in
embedding_extension.pt)
Reproducibility
# 1. Pull Chordonomicon raw csv into ai/data/raw/chordonomicon/
# 2. Extract this genre subset
uv run python ai/training/extract_genre_subsets.py --genres funk --merge
# 3. Train the LoRA at the released rank
uv run python ai/training/lora_train.py --config ai/training/configs/lora/funk.yaml
Hyperparameters: 8 epochs · batch 32 × accum 2 · lr 3e-4 · 1-epoch warmup · AMP fp16 · best.pt selected by min val_loss.
Genre character
Funk grooves with extended dominants and altered sevenths
Rank sweep
The released adapter is the best-rank snapshot from training the same LoRA recipe at five different ranks. Numbers are validation-set token-level metrics (no key augmentation).
| Rank | val_loss | val_top1 (%) | val_top5 (%) | Δtop1 vs F1 |
|---|---|---|---|---|
| r=4 | 0.5694 | 84.62 | 96.05 | +2.08 |
| r=8 | 0.5688 | 84.64 | 96.05 | +2.10 ← selected |
| r=16 | 0.5688 | 84.62 | 96.05 | +2.08 |
| r=32 | 0.5714 | 84.60 | 95.23 | +2.06 |
| r=64 | 0.5697 | 84.72 | 96.07 | +2.18 |
Selection criterion: minimum validation cross-entropy loss; val_top1 as tiebreaker.
Evaluation
Validation token-level metrics on the genre-specific val split (283 sequences, no key augmentation).
| Metric | F1 base alone | F1 + this LoRA | Δ |
|---|---|---|---|
| Top-1 accuracy (%) | 82.54 | 84.64 | +2.10 |
| Top-5 accuracy (%) | 94.38 | 96.05 | +1.67 |
| Cross-entropy loss | 0.7878 | 0.5688 | -0.2190 |
Real-song eval
Mean validation top-1/top-5/cross-entropy on 10 held-out real funk songs from ai/data/eval_real_songs.jsonl. Teacher-forced eval.
| Model | Top-1 (%) | Top-5 (%) | val_loss |
|---|---|---|---|
| F1 base alone | 83.85 | 96.03 | 0.6811 |
| F1 + this LoRA | 84.71 | 96.37 | 0.5912 |
| Δ | +0.87 | +0.33 | -0.0899 |
License and use
The adapter weights are released under CC BY-NC 4.0 (matching Chordonomicon, the upstream training corpus). Permitted: research, paper replication, portfolio, demo. Not permitted: commercial deployment without separate licensing of upstream data.
Usage
Both the base repo and this LoRA repo ship the project's model.py and tokenizer.py at the repo root, so external users can load this adapter end-to-end without cloning anything from GitHub.
Required dependencies: torch, huggingface_hub, peft, safetensors.
import sys
import torch
import torch.nn as nn
from huggingface_hub import snapshot_download
from peft import PeftModel
# 1. Download the base + LoRA repos. Both bundle model.py and tokenizer.py.
base_dir = snapshot_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80")
lora_dir = snapshot_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-lora-funk")
sys.path.insert(0, base_dir) # so the next two imports resolve
from model import MusicTransformer
from tokenizer import ChordTokenizer
# 2. Extended tokenizer (351 base + 8 new genre tokens = 359).
tokenizer = ChordTokenizer(include_extra_genres=True)
# 3. Build the model at the BASE vocab size (351)
BASE_VOCAB = 351
model = MusicTransformer(
vocab_size=BASE_VOCAB,
d_model=512, n_heads=8, d_ff=2048, n_layers=8,
max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
ckpt = torch.load(f"{base_dir}/best.pt", map_location="cpu", weights_only=False)
model.load_state_dict(ckpt["model_state_dict"])
# 4. Grow token_emb + out_proj from 351 -> 359 then overlay the LoRA's trained extension rows.
def _grow_to_extended_vocab(m, new_vocab, none_id):
d = m.token_emb.embedding_dim
new_emb = nn.Embedding(new_vocab, d, padding_idx=m.token_emb.padding_idx)
with torch.no_grad():
new_emb.weight[:m.token_emb.num_embeddings] = m.token_emb.weight
for i in range(m.token_emb.num_embeddings, new_vocab):
new_emb.weight[i] = m.token_emb.weight[none_id]
m.token_emb = new_emb
new_out = nn.Linear(d, new_vocab, bias=False)
with torch.no_grad():
new_out.weight[:m.out_proj.out_features] = m.out_proj.weight
for i in range(m.out_proj.out_features, new_vocab):
new_out.weight[i] = m.out_proj.weight[none_id]
m.out_proj = new_out
_grow_to_extended_vocab(model, tokenizer.vocab_size, tokenizer.encode_genre("none"))
ext = torch.load(f"{lora_dir}/embedding_extension.pt",
map_location="cpu", weights_only=False)
model.token_emb.load_state_dict(ext["token_emb_state"])
model.out_proj.load_state_dict(ext["out_proj_state"])
# 5. Apply the LoRA adapter (the adapter files live at lora_dir/adapter/).
model = PeftModel.from_pretrained(model, f"{lora_dir}/adapter")
model.eval()
# 6. Generate a funk continuation.
song = {
"key": "Cmaj", "time_signature": "4/4", "genre": "funk",
"bars": [["Cmaj7"], ["Fmaj7"]],
}
prompt_ids = tokenizer.encode_sequence(song)[:-1]
ids = torch.tensor([prompt_ids])
with torch.no_grad():
for _ in range(32):
logits = model(ids)
next_id = torch.multinomial(
torch.softmax(logits[:, -1, :] / 0.8, dim=-1), 1,
)
ids = torch.cat([ids, next_id], dim=-1)
if next_id.item() == tokenizer.eos_id:
break
print(tokenizer.decode(ids[0].tolist()))
Citation
@misc{lee2026chordmix,
title = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
author = {Lee, Jinju},
year = {2026},
eprint = {2605.04998},
archivePrefix = {arXiv}
}
@misc{lee2026chordtimeseries,
title = {How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity?},
author = {Lee, Jinju},
year = {2026},
eprint = {2606.07334},
archivePrefix = {arXiv}
}
- Downloads last month
- 129