Instructions to use PearlLeeStudio/TheArtist-MusicTransformer-lora-electronic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use PearlLeeStudio/TheArtist-MusicTransformer-lora-electronic with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
TheArtist Music Transformer β LoRA Adapter (Electronic)
LoRA adapter (r=4) that conditions the F1 base (PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80) toward electronic chord progressions β electronic harmony β minor-key, repetitive vamps, tonic-centred motion. One of eleven per-genre adapters from the paper How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? (Lee, 2026). The released snapshot is the best rank of a 5-point sweep (r β {4, 8, 16, 32, 64}); the vocabulary is extended 351 β 359 with [GENRE:X] tokens shipped in embedding_extension.pt.
Paper Β· Code Β· Demo Β· All models
Base-weights note. The released F1 base is weight-identical to the Phase-0 pop baseline (a checkpoint-selection artifact β see the note on the base card). Every "F1 base" column below was measured against those exact weights, so the Ξ shown is the adapter's gain over a pure-pop harmonic prior.
Usage
Requires torch, huggingface_hub, peft, safetensors. Both repos bundle model.py and tokenizer.py, so nothing needs to be cloned from GitHub.
import sys
import torch
import torch.nn as nn
from huggingface_hub import snapshot_download
from peft import PeftModel
# 1. Download the base + LoRA repos. Both bundle model.py and tokenizer.py.
base_dir = snapshot_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80")
lora_dir = snapshot_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-lora-electronic")
sys.path.insert(0, base_dir) # so the next two imports resolve
from model import MusicTransformer
from tokenizer import ChordTokenizer
# 2. Extended tokenizer (351 base + 8 new genre tokens = 359). The PAD id
# is unchanged across base and extended tokenizers.
tokenizer = ChordTokenizer(include_extra_genres=True)
# 3. Build the model at the BASE vocab size (351) so F1's state_dict loads
# cleanly; we grow the embedding rows immediately after. Passing the
# extended tokenizer's pad_id is safe because PAD is shared (see step 2).
BASE_VOCAB = 351
model = MusicTransformer(
vocab_size=BASE_VOCAB,
d_model=512, n_heads=8, d_ff=2048, n_layers=8,
max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
ckpt = torch.load(f"{base_dir}/best.pt", map_location="cpu", weights_only=False)
model.load_state_dict(ckpt["model_state_dict"])
# 4. Grow token_emb + out_proj from 351 -> 359 (new rows init from
# [GENRE:none]), then overlay the LoRA's trained extension rows.
def _grow_to_extended_vocab(m, new_vocab, none_id):
d = m.token_emb.embedding_dim
new_emb = nn.Embedding(new_vocab, d, padding_idx=m.token_emb.padding_idx)
with torch.no_grad():
new_emb.weight[:m.token_emb.num_embeddings] = m.token_emb.weight
for i in range(m.token_emb.num_embeddings, new_vocab):
new_emb.weight[i] = m.token_emb.weight[none_id]
m.token_emb = new_emb
new_out = nn.Linear(d, new_vocab, bias=False)
with torch.no_grad():
new_out.weight[:m.out_proj.out_features] = m.out_proj.weight
for i in range(m.out_proj.out_features, new_vocab):
new_out.weight[i] = m.out_proj.weight[none_id]
m.out_proj = new_out
_grow_to_extended_vocab(model, tokenizer.vocab_size, tokenizer.encode_genre("none"))
ext = torch.load(f"{lora_dir}/embedding_extension.pt",
map_location="cpu", weights_only=False)
model.token_emb.load_state_dict(ext["token_emb_state"])
model.out_proj.load_state_dict(ext["out_proj_state"])
# 5. Apply the LoRA adapter (the adapter files live at lora_dir/adapter/).
model = PeftModel.from_pretrained(model, f"{lora_dir}/adapter")
model.eval()
# 6. Generate a electronic continuation. With LoRA injected,
# PeftModel.forward routes through the adapted attention layers.
song = {
"key": "Cmaj", "time_signature": "4/4", "genre": "electronic",
"bars": [["Cmaj7"], ["Fmaj7"]],
}
prompt_ids = tokenizer.encode_sequence(song)[:-1]
ids = torch.tensor([prompt_ids])
with torch.no_grad():
for _ in range(32):
logits = model(ids) # routed through LoRA via PeftModel
next_id = torch.multinomial(
torch.softmax(logits[:, -1, :] / 0.8, dim=-1), 1,
)
ids = torch.cat([ids, next_id], dim=-1)
if next_id.item() == tokenizer.eos_id:
break
print(tokenizer.decode(ids[0].tolist()))
Evaluation
Teacher-forced token-level metrics on the electronic val split (1,519 sequences, no key augmentation). Both columns use the same dataloader and the same [GENRE:none]-initialised embedding extension β only the adapter weights and trained embedding rows differ.
| Metric | F1 base | F1 + this LoRA (r=4) | Ξ |
|---|---|---|---|
| Top-1 accuracy (%) | 84.50 | 86.42 | +1.92 |
| Top-5 accuracy (%) | 95.93 | 97.53 | +1.60 |
| Cross-entropy loss | 0.6835 | 0.4737 | -0.2098 |
Rank sweep the released adapter was selected from (minimum val loss, top-1 tiebreak):
| Rank | val_loss | Top-1 (%) | Ξ Top-1 vs F1 |
|---|---|---|---|
| r=4 | 0.4737 | 86.42 | +1.92 β released |
| r=8 | 0.4742 | 87.24 | +2.74 |
| r=16 | 0.4745 | 86.42 | +1.92 |
| r=32 | 0.4739 | 86.43 | +1.93 |
| r=64 | 0.4762 | 86.42 | +1.92 |
Real-song check β mean over 10 held-out electronic songs (25β84 bars each; titles are Spotify track IDs by upstream Chordonomicon policy β the progressions are real songs):
| Model | Top-1 (%) | Top-5 (%) | Loss |
|---|---|---|---|
| F1 base | 87.39 | 98.45 | 0.5072 |
| F1 + this LoRA | 87.93 | 98.49 | 0.4122 |
| Ξ | +0.54 | +0.04 | -0.0950 |
The 10 songs are this genre's slice of a 130-song eval set (10 per genre Γ 13 genres, seed 42) drawn from the held-out val/test partitions only β pop from McGill Billboard (CC0), jazz from public standards corpora, classical from Bach chorales, the other ten genres from the matching Chordonomicon subsets (CC BY-NC 4.0).
Training data
15,196 songs β the electronic subset of the Chordonomicon dataset, song-level 80/10/10 split (seed 42), 12-key augmentation on train. Chordonomicon is licensed CC BY-NC 4.0; see the dataset card for full terms.
Adapter: LoRA on the Q/K/V projections (w_q, w_k, w_v), r=4, Ξ±=8, dropout 0.05; ~99K trainable parameters (0.4% of base); adapter file 0.4 MB. Best checkpoint by minimum val loss.
License
CC BY-NC 4.0 (matching Chordonomicon, the upstream training corpus). Research, paper replication, portfolio, and demo use are permitted; commercial use is not.
Citation
@misc{lee2026chordmix,
title = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
author = {Lee, Jinju},
year = {2026},
eprint = {2605.04998},
archivePrefix = {arXiv}
}
@misc{lee2026chordtimeseries,
title = {How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity?},
author = {Lee, Jinju},
year = {2026},
eprint = {2606.07334},
archivePrefix = {arXiv}
}
- Downloads last month
- 85