TuneJury: An Open Metric for Improving Music Generation Preference Alignment
Paper • 2606.17006 • Published
TuneJury is a 2.8M-parameter MLP head over frozen LAION-CLAP-Music + MERT-v1-330M embeddings, trained with a shared-weight pairwise-logistic objective on ~17.5K human A vs. B preferences from four open sources. It scores a single audio clip (with an optional text prompt) as one preference scalar.
| File | Encoder / mix | License |
|---|---|---|
tunejury.pt |
CLAP+MERT, 4-dataset (primary) | CC-BY-NC 4.0 |
tunejury_muq_leave_MA.pt |
MuQ-MuLan-large encoder-swap | CC-BY-NC 4.0 |
A1_clap_audio_only.pt |
CLAP-audio-only | Apache-2.0 |
tunejury_leave_*.pt |
leave-one / leave-two-out (fair-eval) | CC-BY-NC 4.0 |
Install the package (also pulls the LAION-CLAP encoder ~2.2 GB on first use; needs
ffmpeg and libsndfile):
pip install git+https://github.com/yonghyunk1m/TuneJury
from huggingface_hub import hf_hub_download
from tunejury.score import Scorer
sc = Scorer.from_pretrained(hf_hub_download("TuneJury/tunejury", "tunejury.pt"))
print(sc.score("clip.wav", "")) # "" -> 512-d zero text vector (paper §3/§4.2 empty-prompt)
print(sc.score("clip.wav", "a calm lo-fi piano loop")) # a prompt uses the text branch
The released head is CC-BY-NC 4.0, tracking the strictest upstream weight license
(MERT-v1-330M). Frozen encoders at inference: LAION-CLAP-Music (CC0 1.0),
MERT-v1-330M (CC-BY-NC 4.0), MuQ-MuLan-large (CC-BY-NC 4.0). The
A1_clap_audio_only head is released under Apache-2.0.
@misc{tunejury2026,
title = {TuneJury: An Open Metric for Improving Music Generation Preference Alignment},
author = {Kim, Yonghyun and Lee, Junwon and Xia, Haiwen and
Ma, Yinghao and Koo, Junghyun and Saito, Koichi and
Mitsufuji, Yuki and Donahue, Chris},
year = {2026},
eprint = {2606.17006},
archivePrefix = {arXiv},
primaryClass = {cs.SD},
url = {https://arxiv.org/abs/2606.17006},
}