DF Arena 500M — Speech Anti-Spoofing Arena results

RAPTOR universal anti-spoofing model. A wav2vec 2.0 XLS-R 300M self-supervised front-end whose per-layer hidden states are combined by learnable attention pooling (a layer-wise sigmoid gate over an attention-pooled summary), then passed through a 4-block Conformer head with a class token to a 2-way classifier. FP32, deterministic first-64600-sample (~4.04 s @ 16 kHz) window, tile-repeat if shorter (no random crop, no resampling). score = softmax(logits)[bonafide]; higher = more bona fide. Official Speech-Arena-2025/DF_Arena_500M_V_1 checkpoint.

Paper: arXiv:2603.06164 · Params: 436M · Checkpoint: SpeechAntiSpoofingBenchmarks/DF_Arena_500M_V_1

Arena standing

Live leaderboard: DF Arena 500M on the Speech Anti-Spoofing Arena

Per-dataset results (24 datasets, mean EER 5.09%)

Dataset	Metric	Score
J-SPAW_LA	EER	0%
ArAD	EER	9.01%
DFADD	EER	0%
SONAR	EER	2.11%
DeepVoice	EER	9.71%
EmoFake_test	EER	2.63%
LibriSeVoc	EER	0.11%
CD-ADD	EER	2.46%
ODSS	EER	8.4%
InTheWild	EER	1.87%
DECRO	EER	4.33%
CFAD	EER	8%
ASVspoof2019_LA	EER	1.19%
HABLA	EER	3.27%
CVoiceFake_small	EER	7.9%
ASVspoof2021_LA	EER	5.78%
PyAra	EER	15.96%
XMAD	EER	2.83%
ASVspoof2021_DF	EER	3.5%
ASVspoof5	EER	13.43%
ADD22_eval_31	EER	1.97%
ADD2023_track12_test_r1	EER	7.44%
EmoSpoofTTS	1-SRR	3.1%
LRLspoof	1-SRR	1.61%

EER = Equal Error Rate (lower better). 1-SRR = spoof-only complement of the Spoof Recall Rate at the model's own DeepVoice EER operating point (lower better). All rows scoring-verified (reproduce --scoring, Δ 0.0) and computed with the TensorRT engine (parity-verified vs PyTorch).

Usage

from transformers import pipeline
import librosa
pipe = pipeline("antispoofing", model="SpeechAntiSpoofingBenchmarks/DF_Arena_500M_V_1", trust_remote_code=True, device="cuda")
audio, sr = librosa.load("sample.wav", sr=16000)
print(pipe(audio))   # {'label': 'bonafide'|'spoof', 'all_scores': {...}}

Citation

@misc{kulkarni2026compactsslbackbonesmatter,
  title={Do Compact SSL Backbones Matter for Audio Deepfake Detection? A Controlled Study with RAPTOR},
  author={Ajinkya Kulkarni and Sandipana Dowerah and Atharva Kulkarni and Tanel Alumäe and Mathew Magimai Doss},
  year={2026},
  eprint={2603.06164},
  archivePrefix={arXiv},
  primaryClass={cs.SD},
  url={https://arxiv.org/abs/2603.06164}
}

Downloads last month: 440

Paper for SpeechAntiSpoofingBenchmarks/DF_Arena_500M_V_1

Do Compact SSL Backbones Matter for Audio Deepfake Detection? A Controlled Study with RAPTOR

Paper • 2603.06164 • Published Mar 6