🏆 Speech Anti-Spoofing Arena

Live on the reproducible Speech Anti-Spoofing Arena — 🔓 Unpublished / Proprietary tier (listed but unranked, no paper). Every score is sha-pinned and reproducible via speech-spoof-bench reproduce --scoring.

Model Card: Spectra-AASIST (anti-spoofing / bonafide vs spoof)

Spectra-AASIST is a model for speech spoofing detection (binary classification: bonafide vs spoof) from raw audio waveforms. Architecture: SSL encoder (Wav2Vec2) → MLP projection → AASIST 2-class classifier.

Input: waveform (float32), shape (batch, num_samples) (typically 16 kHz).
Output: logits of shape (batch, 2), where index 0 = spoof, index 1 = bonafide.

On first run, the model will automatically download the SSL encoder facebook/wav2vec2-xls-r-300m via transformers.

Evaluation Results

Model	ASVspoof19 LA	ASVspoof21 LA	ASVspoof21 DF	ASVspoof5	ADD2022	In-the-Wild	AD2R1	AD2R2	AD3R1	AD2R2
Res2TCNGuard	7.487	19.130	19.883	37.620	49.538	49.246	34.683	35.343	48.051	39.558
AASIST3	27.585	37.407	33.099	41.001	47.192	39.626	36.581	37.351	41.333	44.278
XSLS	0.231	7.714	4.220	17.688	33.951	7.453	14.386	15.743	19.368	21.095
TCM-ADD	0.152	6.655	3.444	19.505	35.252	7.767	16.951	17.688	21.913	18.627
DF Arena 1B	43.793	40.137	42.994	35.333	42.139	17.598	12.442	13.292	33.381	43.42
Spectra-0	0.181	6.475	5.410	14.426	14.716	1.026	1.578	2.372	6.535	15.154
Spectra-AASIST	0.159	5.164	2.568	14.056	15.205	1.461	0.939	1.802	6.427	12.968
Spectra-AASIST3	0.723	4.506	1.998	13.82	15.187	0.961	0.727	1.806	6.502	14.481

Quickstart

Clone from Hugging Face

This repository is hosted on Hugging Face Hub: https://huggingface.co/lab260/spectra_aasist.

git lfs install
git clone https://huggingface.co/lab260/spectra_aasist
cd spectra_aasist

Install dependencies

pip install -U torch torchaudio transformers huggingface_hub safetensors soundfile

Single-file inference (example preprocessing)

import random
import torch
import torchaudio
import soundfile as sf

from model import spectra_aasist


def pad_random(x: torch.Tensor, max_len: int = 64600) -> torch.Tensor:
    # x: (num_samples,) or (1, num_samples)
    if x.ndim > 1:
        x = x.squeeze()
    x_len = x.shape[0]
    if x_len >= max_len:
        start = random.randint(0, x_len - max_len)
        return x[start:start + max_len]
    num_repeats = int(max_len / x_len) + 1
    return x.repeat(num_repeats)[:max_len]


def load_audio_mono(path: str) -> torch.Tensor:
    audio, sr = sf.read(path, dtype="float32")
    audio = torch.from_numpy(audio)
    if audio.ndim > 1:
        # (num_samples, channels) -> mono
        audio = audio.mean(dim=1)
    if sr != 16000:
        audio = torchaudio.functional.resample(audio, sr, 16000)
    return audio


device = "cuda" if torch.cuda.is_available() else "cpu"
model = spectra_aasist.from_pretrained(pretrained_model_name_or_path=".").eval().to(device)

audio = load_audio_mono("path/to/audio.wav")
audio = torchaudio.functional.preemphasis(audio.unsqueeze(0))  # (1, T)
audio = pad_random(audio.squeeze(0), 64600).unsqueeze(0)       # (1, 64600)

with torch.inference_mode():
    logits = model(audio.to(device))  # (1, 2)
    score_spoof = logits[0, 0].item()
    score_bonafide = logits[0, 1].item()

print({"score_bonafide": score_bonafide, "score_spoof": score_spoof})

Threshold-based classification (and how to tune it)

In model.py, the SpectraAASIST class provides classify() with a default threshold chosen as an “optimal” value for the original setting:

Default threshold: -1.140625 (it thresholds logit_bonafide = logits[:, 1])
Note: this threshold may not be optimal on a different dataset/domain. It’s recommended to tune the threshold on your dataset using EER (Equal Error Rate) or a target FAR/FRR.

Example:

with torch.inference_mode():
    pred = model.classify(audio.to(device), threshold=-1.140625)  # 1=bonafide, 0=spoof

Tuning the threshold via EER (typical workflow)

Run the model on a labeled set and collect scores for both classes.
Compute EER and the threshold

Limitations and notes

This is a pre-release model.
Significantly stronger models are planned for Q3–Q4 2026 — stay tuned.

License

MIT (see the license field in the model repo header).

Contacts

TG channel: https://t.me/korallll_ai email: k.n.borodin@mtuci.ru website: https://lab260.ru/

Downloads last month: 546

Safetensors

Model size

0.3B params

Tensor type

F32

Collection including lab260/Spectra-AASIST

Spectra

Collection

Family of Speech Antispoofing models • 3 items • Updated May 10

Evaluation results

Equal Error Rate on ASVspoof19_LA
self-reported

0.159
Equal Error Rate on ASVspoof21_LA
self-reported

5.164
Equal Error Rate on ASVspoof21_DF
self-reported

2.568
Equal Error Rate on ASVspoof5
self-reported

14.056
Equal Error Rate on ADD2022
self-reported

15.205
Equal Error Rate on In-the-Wild
self-reported

1.461
Equal Error Rate on AD2R1
self-reported

0.939
Equal Error Rate on AD2R2
self-reported

1.802