DF Arena 1B — Speech Anti-Spoofing Arena results

RAPTOR universal anti-spoofing model. A wav2vec 2.0 XLS-R 1B self-supervised front-end whose per-layer hidden states are combined by learnable attention pooling (a layer-wise sigmoid gate over an attention-pooled summary), then passed through a 4-block Conformer head with a class token to a 2-way classifier. FP32, deterministic first-64600-sample (~4.04 s @ 16 kHz) window, tile-repeat if shorter (no random crop, no resampling). score = softmax(logits)[bonafide]; higher = more bona fide. Official Speech-Arena-2025/DF_Arena_1B_V_1 checkpoint.

Paper: arXiv:2603.06164 · Params: 1148M · Checkpoint: SpeechAntiSpoofingBenchmarks/DF_Arena_1B_V_1

Arena standing

EER% 0 on J-SPAW_LA EER% 3.72 on ArAD EER% 0 on DFADD EER% 1.06 on SONAR EER% 8.29 on DeepVoice EER% 1.9 on EmoFake_test EER% 0.15 on LibriSeVoc EER% 1.72 on CD-ADD EER% 6.03 on ODSS EER% 0.91 on InTheWild EER% 4.41 on DECRO EER% 8.32 on CFAD EER% 1.06 on ASVspoof2019_LA EER% 0.86 on HABLA EER% 5.84 on CVoiceFake_small EER% 4.81 on ASVspoof2021_LA EER% 4.19 on PyAra EER% 1.86 on XMAD EER% 1.88 on ASVspoof2021_DF EER% 17.34 on ASVspoof5 EER% 1.12 on ADD22_eval_31 EER% 5.09 on ADD2023_track12_test_r1 1-SRR% 0.39 on EmoSpoofTTS 1-SRR% 0.45 on LRLspoof arena tier arena rank

Live leaderboard: DF Arena 1B on the Speech Anti-Spoofing Arena

Per-dataset results (24 datasets, mean EER 3.66%)

Dataset Metric Score
J-SPAW_LA EER 0%
ArAD EER 3.72%
DFADD EER 0%
SONAR EER 1.06%
DeepVoice EER 8.29%
EmoFake_test EER 1.9%
LibriSeVoc EER 0.15%
CD-ADD EER 1.72%
ODSS EER 6.03%
InTheWild EER 0.91%
DECRO EER 4.41%
CFAD EER 8.32%
ASVspoof2019_LA EER 1.06%
HABLA EER 0.86%
CVoiceFake_small EER 5.84%
ASVspoof2021_LA EER 4.81%
PyAra EER 4.19%
XMAD EER 1.86%
ASVspoof2021_DF EER 1.88%
ASVspoof5 EER 17.34%
ADD22_eval_31 EER 1.12%
ADD2023_track12_test_r1 EER 5.09%
EmoSpoofTTS 1-SRR 0.39%
LRLspoof 1-SRR 0.45%

EER = Equal Error Rate (lower better). 1-SRR = spoof-only complement of the Spoof Recall Rate at the model's own DeepVoice EER operating point (lower better). All rows scoring-verified (reproduce --scoring, Δ 0.0) and computed with the TensorRT engine (parity-verified vs PyTorch).

Usage

from transformers import pipeline
import librosa
pipe = pipeline("antispoofing", model="SpeechAntiSpoofingBenchmarks/DF_Arena_1B_V_1", trust_remote_code=True, device="cuda")
audio, sr = librosa.load("sample.wav", sr=16000)
print(pipe(audio))   # {'label': 'bonafide'|'spoof', 'all_scores': {...}}

Citation

@misc{kulkarni2026compactsslbackbonesmatter,
  title={Do Compact SSL Backbones Matter for Audio Deepfake Detection? A Controlled Study with RAPTOR},
  author={Ajinkya Kulkarni and Sandipana Dowerah and Atharva Kulkarni and Tanel Alumäe and Mathew Magimai Doss},
  year={2026},
  eprint={2603.06164},
  archivePrefix={arXiv},
  primaryClass={cs.SD},
  url={https://arxiv.org/abs/2603.06164}
}
Downloads last month
461
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for SpeechAntiSpoofingBenchmarks/DF_Arena_1B_V_1