XLSR-SLS / meta.yaml
korallll's picture
Add meta.yaml (HF download-stats query file) to enable download tracking
2e82427 verified
system:
name: "XLSR-SLS"
slug: "xlsr-sls"
description: >
wav2vec 2.0 (XLS-R 300M) self-supervised front-end with the SLS (Sensitive
Layer Selection) classifier for audio deepfake detection. SLS gates and fuses
the hidden states of all XLS-R transformer layers — each layer contributing
distinct discriminative cues — via a per-layer sigmoid attention, sums the
weighted multi-layer feature, then a BN + max-pool + two-layer MLP head emits
a 2-way log-softmax. Official QiShanZhang/SLSforASVspoof-2021-DF checkpoint
(model_15, dev-EER 1.45%), trained on ASVspoof2019 LA, FP32, deterministic
first-64600-sample window (no random crop).
code: "https://github.com/QiShanZhang/SLSforASVspoof-2021-DF"
checkpoint: "https://huggingface.co/SpeechAntiSpoofingBenchmarks/XLSR-SLS"
params_millions: 340.7900
paper:
arxiv_id: "10.1145/3664647.3681345" # no arXiv exists; ACM MM 2024 DOI (per user decision 2026-06-05)
url: "https://doi.org/10.1145/3664647.3681345"
bibtex: |
@inproceedings{zhang2024audio,
title={Audio Deepfake Detection with Self-Supervised XLS-R and SLS Classifier},
author={Zhang, Qishan and Wen, Shuangbing and Hu, Tao},
booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
pages={6765--6773},
year={2024},
doi={10.1145/3664647.3681345}
}
notes: >
XLS-R 300M (wav2vec 2.0) front-end + SLS (Sensitive Layer Selection) classifier,
from QiShanZhang/SLSforASVspoof-2021-DF (ACM MM 2024). Architecture is built from
the base xlsr2_300m.pt model config (shared with the W2V2-AASIST submission),
then every weight is overwritten by the fine-tuned checkpoint. SLS pools every
transformer layer's hidden state, gates each by a learned sigmoid attention, and
fuses them before a small MLP head. Deterministic first-64600-sample window (no
random crop); the head's fc1 expects this fixed length. score = log-softmax
output for class 1 (bona fide); higher = more bona fide (source main.py:
batch_score = batch_out[:, 1]).