File size: 2,088 Bytes
2e82427
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
system:
  name: "XLSR-SLS"
  slug: "xlsr-sls"
  description: >
    wav2vec 2.0 (XLS-R 300M) self-supervised front-end with the SLS (Sensitive
    Layer Selection) classifier for audio deepfake detection. SLS gates and fuses
    the hidden states of all XLS-R transformer layers — each layer contributing
    distinct discriminative cues — via a per-layer sigmoid attention, sums the
    weighted multi-layer feature, then a BN + max-pool + two-layer MLP head emits
    a 2-way log-softmax. Official QiShanZhang/SLSforASVspoof-2021-DF checkpoint
    (model_15, dev-EER 1.45%), trained on ASVspoof2019 LA, FP32, deterministic
    first-64600-sample window (no random crop).
  code: "https://github.com/QiShanZhang/SLSforASVspoof-2021-DF"
  checkpoint: "https://huggingface.co/SpeechAntiSpoofingBenchmarks/XLSR-SLS"
  params_millions: 340.7900
  paper:
    arxiv_id: "10.1145/3664647.3681345"   # no arXiv exists; ACM MM 2024 DOI (per user decision 2026-06-05)
    url: "https://doi.org/10.1145/3664647.3681345"
    bibtex: |
      @inproceedings{zhang2024audio,
        title={Audio Deepfake Detection with Self-Supervised XLS-R and SLS Classifier},
        author={Zhang, Qishan and Wen, Shuangbing and Hu, Tao},
        booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
        pages={6765--6773},
        year={2024},
        doi={10.1145/3664647.3681345}
      }
notes: >
  XLS-R 300M (wav2vec 2.0) front-end + SLS (Sensitive Layer Selection) classifier,
  from QiShanZhang/SLSforASVspoof-2021-DF (ACM MM 2024). Architecture is built from
  the base xlsr2_300m.pt model config (shared with the W2V2-AASIST submission),
  then every weight is overwritten by the fine-tuned checkpoint. SLS pools every
  transformer layer's hidden state, gates each by a learned sigmoid attention, and
  fuses them before a small MLP head. Deterministic first-64600-sample window (no
  random crop); the head's fc1 expects this fixed length. score = log-softmax
  output for class 1 (bona fide); higher = more bona fide (source main.py:
  batch_score = batch_out[:, 1]).