lilyBERT

lilyBERT is a masked language model for LilyPond music notation, built by adapting CodeBERT to the musical domain.

LilyPond is a text-based music engraving language with formal grammar, block structure, and backslash commands — making it structurally similar to a programming language. lilyBERT leverages this by extending CodeBERT's vocabulary with 115 domain-specific tokens (e.g. \trill, \fermata, \mordent, \staccato) and performing MLM pre-training on curated Baroque music scores.

Training

This checkpoint was trained in two stages:

Stage 1 — PDMX pre-training: CodeBERT fine-tuned on the PDMX corpus of automatically converted LilyPond files.
Stage 2 — BMdataset fine-tuning: Further fine-tuned on the BMdataset, a musicologically curated collection of ~~470 Baroque scores in LilyPond format (~~90M tokens).

Hyperparameter	Value
Architecture	RobertaForMaskedLM (12 layers, 768 hidden, 12 heads)
Vocab size	50,380 (50,265 base + 115 music tokens)
Max sequence length	512
MLM probability	0.15
Batch size	72 × 2 GPUs × 2 grad. accum. = 288
Learning rate	2e-4 (cosine schedule)
Warmup	10%
Epochs	10 (early stopping, patience 5)
Precision	bf16
Optimizer	AdamW (fused)

Results

Linear probing on the out-of-domain Mutopia corpus (layer 6, 5-fold CV):

Model	Composer Acc.	Style Acc.
CB + PDMX_full (15B tokens)	80.8	82.6
CB + BMdataset (90M tokens)	82.9	83.7
CB + PDMX_90M (90M tokens)	81.7	82.3
CB + PDMX → BM (this model)	84.3	82.9

90M tokens of expertly curated data outperform 15B tokens of automatically converted data. Combining broad pre-training with domain-specific fine-tuning yields the best overall composer accuracy.

Usage

from transformers import AutoModelForMaskedLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("csc-unipd/lilybert")
model = AutoModelForMaskedLM.from_pretrained("csc-unipd/lilybert")

Fill-mask example

from transformers import pipeline

filler = pipeline("fill-mask", model="csc-unipd/lilybert")
filler("\\relative c' { c4 d <mask> f | g2 g }")

Feature extraction

import torch

inputs = tokenizer("\\relative c' { c4 d e f | g2 g }", return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs, output_hidden_states=True)

# Layer 6 embeddings (best for linear probing)
embeddings = outputs.hidden_states[6]

Citation

@misc{spanio2026lilybert,
      title={BMdataset: A Musicologically Curated LilyPond Dataset}, 
      author={Matteo Spanio and Ilay Guler and Antonio Rodà},
      year={2026},
      eprint={2604.10628},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2604.10628}, 
}

@misc{spanio2026llmsunderstandlilypondbenchmark,
      title={Can LLMs understand LilyPond? A benchmark for symbolic music generation and understanding}, 
      author={Matteo Spanio and Mohammad Torabi and Andrea Poltronieri and Antonio Rodà},
      year={2026},
      eprint={2606.08722},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2606.08722}, 
}

License

Apache-2.0

Downloads last month: 18

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for csc-unipd/lilybert

Base model

microsoft/codebert-base

Quantized

(5)

this model

Papers for csc-unipd/lilybert

Can LLMs understand LilyPond? A benchmark for symbolic music generation and understanding

Paper • 2606.08722 • Published Jun 7

BMdataset: A Musicologically Curated LilyPond Dataset

Paper • 2604.10628 • Published Apr 12 • 2

Evaluation results

Composer Accuracy on Mutopia (out-of-domain)
self-reported

84.300
Style Accuracy on Mutopia (out-of-domain)
self-reported

82.900

csc-unipd
/

lilybert