Spectra-AASIST / README.md
wh1tet3a's picture
Update README.md
fae1630 verified
metadata
library_name: pytorch
tags:
  - audio
  - spoofing-detection
  - anti-spoofing
  - wav2vec2
  - aasist
license: apache-2.0
pipeline_tag: audio-classification
model-index:
  - name: spectra_aasist
    results:
      - task:
          type: Speech Antispoofing
        dataset:
          name: ASVspoof19_LA
          type: ASVspoof19_LA
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 0.159
      - task:
          type: Speech Antispoofing
        dataset:
          name: ASVspoof21_LA
          type: ASVspoof21_LA
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 5.164
      - task:
          type: Speech Antispoofing
        dataset:
          name: ASVspoof21_DF
          type: ASVspoof21_DF
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 2.568
      - task:
          type: Speech Antispoofing
        dataset:
          name: ASVspoof5
          type: ASVspoof5
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 14.056
      - task:
          type: Speech Antispoofing
        dataset:
          name: ADD2022
          type: ADD2022
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 15.205
      - task:
          type: Speech Antispoofing
        dataset:
          name: In-the-Wild
          type: In-the-Wild
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 1.461
      - task:
          type: Speech Antispoofing
        dataset:
          name: AD2R1
          type: AD2R1
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 0.939
      - task:
          type: Speech Antispoofing
        dataset:
          name: AD2R2
          type: AD2R2
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 1.802
      - task:
          type: Speech Antispoofing
        dataset:
          name: AD3R1
          type: AD3R1
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 6.502
      - task:
          type: Speech Antispoofing
        dataset:
          name: AD3R2
          type: AD3R2
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 14.481

Model Card: Spectra-AASIST (anti-spoofing / bonafide vs spoof)

Spectra-AASIST is a model for speech spoofing detection (binary classification: bonafide vs spoof) from raw audio waveforms. Architecture: SSL encoder (Wav2Vec2) → MLP projection → AASIST 2-class classifier.

  • Input: waveform (float32), shape (batch, num_samples) (typically 16 kHz).
  • Output: logits of shape (batch, 2), where index 0 = spoof, index 1 = bonafide.

On first run, the model will automatically download the SSL encoder facebook/wav2vec2-xls-r-300m via transformers.

Evaluation Results

Model ASVspoof19 LA ASVspoof21 LA ASVspoof21 DF ASVspoof5 ADD2022 In-the-Wild AD2R1 AD2R2 AD3R1 AD2R2
Res2TCNGuard 7.487 19.130 19.883 37.620 49.538 49.246 34.683 35.343 48.051 39.558
AASIST3 27.585 37.407 33.099 41.001 47.192 39.626 36.581 37.351 41.333 44.278
XSLS 0.231 7.714 4.220 17.688 33.951 7.453 14.386 15.743 19.368 21.095
TCM-ADD 0.152 6.655 3.444 19.505 35.252 7.767 16.951 17.688 21.913 18.627
DF Arena 1B 43.793 40.137 42.994 35.333 42.139 17.598 12.442 13.292 33.381 43.42
Spectra-0 0.181 6.475 5.410 14.426 14.716 1.026 1.578 2.372 6.535 15.154
Spectra-AASIST 0.159 5.164 2.568 14.056 15.205 1.461 0.939 1.802 6.427 12.968
Spectra-AASIST3 0.723 4.506 1.998 13.82 15.187 0.961 0.727 1.806 6.502 14.481

Quickstart

Clone from Hugging Face

This repository is hosted on Hugging Face Hub: https://huggingface.co/lab260/spectra_aasist.

git lfs install
git clone https://huggingface.co/lab260/spectra_aasist
cd spectra_aasist

Install dependencies

pip install -U torch torchaudio transformers huggingface_hub safetensors soundfile

Single-file inference (example preprocessing)

import random
import torch
import torchaudio
import soundfile as sf

from model import spectra_aasist


def pad_random(x: torch.Tensor, max_len: int = 64600) -> torch.Tensor:
    # x: (num_samples,) or (1, num_samples)
    if x.ndim > 1:
        x = x.squeeze()
    x_len = x.shape[0]
    if x_len >= max_len:
        start = random.randint(0, x_len - max_len)
        return x[start:start + max_len]
    num_repeats = int(max_len / x_len) + 1
    return x.repeat(num_repeats)[:max_len]


def load_audio_mono(path: str) -> torch.Tensor:
    audio, sr = sf.read(path, dtype="float32")
    audio = torch.from_numpy(audio)
    if audio.ndim > 1:
        # (num_samples, channels) -> mono
        audio = audio.mean(dim=1)
    if sr != 16000:
        audio = torchaudio.functional.resample(audio, sr, 16000)
    return audio


device = "cuda" if torch.cuda.is_available() else "cpu"
model = spectra_aasist.from_pretrained(pretrained_model_name_or_path=".").eval().to(device)

audio = load_audio_mono("path/to/audio.wav")
audio = torchaudio.functional.preemphasis(audio.unsqueeze(0))  # (1, T)
audio = pad_random(audio.squeeze(0), 64600).unsqueeze(0)       # (1, 64600)

with torch.inference_mode():
    logits = model(audio.to(device))  # (1, 2)
    score_spoof = logits[0, 0].item()
    score_bonafide = logits[0, 1].item()

print({"score_bonafide": score_bonafide, "score_spoof": score_spoof})

Threshold-based classification (and how to tune it)

In model.py, the SpectraAASIST class provides classify() with a default threshold chosen as an “optimal” value for the original setting:

  • Default threshold: -1.140625 (it thresholds logit_bonafide = logits[:, 1])
  • Note: this threshold may not be optimal on a different dataset/domain. It’s recommended to tune the threshold on your dataset using EER (Equal Error Rate) or a target FAR/FRR.

Example:

with torch.inference_mode():
    pred = model.classify(audio.to(device), threshold=-1.140625)  # 1=bonafide, 0=spoof

Tuning the threshold via EER (typical workflow)

  1. Run the model on a labeled set and collect scores for both classes.

  2. Compute EER and the threshold

Limitations and notes

  • This is a pre-release model.
  • Significantly stronger models are planned for Q3–Q4 2026 — stay tuned.

License

MIT (see the license field in the model repo header).

Contacts

TG channel: https://t.me/korallll_ai email: k.n.borodin@mtuci.ru website: https://lab260.ru/