Leveraging Whisper Embeddings for Audio-based Lyrics Matching
Paper
•
2510.08176
•
Published
WEALY model: wealy-sbert-lyc
This is a WEALY (WEakly-supervised Audio-LYrics) model for music version identification.
# Download and run inference
python scripts/inference.py \
model_name=audio-based-lyrics-matching/wealy-sbert-lyc \
hidden_states=/path/to/your/hidden-states \
partition=test \
use_overlapping_chunks=true \
ngpus=1
When running inference, you must provide:
hidden_states: Path to pre-extracted hidden states directorypartition: Dataset partition to evaluate (default: "test")use_overlapping_chunks: Enable overlapping chunk evaluation (default: false)chunk_size: Size of overlapping chunks (default: 1500)overlap_percentage: Overlap between chunks (default: 0.9)ngpus: Number of GPUs to use (default: 1)This model was trained for version identification using the WEALY architecture.
If you use this model, please cite:
@article{mancini2025wealy,
title={Leveraging Whisper Embeddings for Audio-based Lyrics Matching},
author={Mancini, Eleonora and Serrà, Joan and Torroni, Paolo and Mitsufuji, Yuki},
journal={arXiv preprint arXiv:2510.08176},
year={2025},
url={https://github.com/helemanc/audio-based-lyrics-matching}
}