Update README.md

fae1630 verified 8 days ago

7.1 kB

library_name: pytorch
tags:
  - audio
  - spoofing-detection
  - anti-spoofing
  - wav2vec2
  - aasist
license: apache-2.0
pipeline_tag: audio-classification
model-index:
  - name: spectra_aasist
    results:
      - task:
          type: Speech Antispoofing
        dataset:
          name: ASVspoof19_LA
          type: ASVspoof19_LA
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 0.159
      - task:
          type: Speech Antispoofing
        dataset:
          name: ASVspoof21_LA
          type: ASVspoof21_LA
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 5.164
      - task:
          type: Speech Antispoofing
        dataset:
          name: ASVspoof21_DF
          type: ASVspoof21_DF
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 2.568
      - task:
          type: Speech Antispoofing
        dataset:
          name: ASVspoof5
          type: ASVspoof5
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 14.056
      - task:
          type: Speech Antispoofing
        dataset:
          name: ADD2022
          type: ADD2022
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 15.205
      - task:
          type: Speech Antispoofing
        dataset:
          name: In-the-Wild
          type: In-the-Wild
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 1.461
      - task:
          type: Speech Antispoofing
        dataset:
          name: AD2R1
          type: AD2R1
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 0.939
      - task:
          type: Speech Antispoofing
        dataset:
          name: AD2R2
          type: AD2R2
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 1.802
      - task:
          type: Speech Antispoofing
        dataset:
          name: AD3R1
          type: AD3R1
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 6.502
      - task:
          type: Speech Antispoofing
        dataset:
          name: AD3R2
          type: AD3R2
        metrics:
          - name: Equal Error Rate
            type: Equal Error Rate
            value: 14.481

Model Card: Spectra-AASIST (anti-spoofing / bonafide vs spoof)

Spectra-AASIST is a model for speech spoofing detection (binary classification: bonafide vs spoof) from raw audio waveforms. Architecture: SSL encoder (Wav2Vec2) → MLP projection → AASIST 2-class classifier.

Input: waveform (float32), shape (batch, num_samples) (typically 16 kHz).
Output: logits of shape (batch, 2), where index 0 = spoof, index 1 = bonafide.

On first run, the model will automatically download the SSL encoder facebook/wav2vec2-xls-r-300m via transformers.

Evaluation Results

Model	ASVspoof19 LA	ASVspoof21 LA	ASVspoof21 DF	ASVspoof5	ADD2022	In-the-Wild	AD2R1	AD2R2	AD3R1	AD2R2
Res2TCNGuard	7.487	19.130	19.883	37.620	49.538	49.246	34.683	35.343	48.051	39.558
AASIST3	27.585	37.407	33.099	41.001	47.192	39.626	36.581	37.351	41.333	44.278
XSLS	0.231	7.714	4.220	17.688	33.951	7.453	14.386	15.743	19.368	21.095
TCM-ADD	0.152	6.655	3.444	19.505	35.252	7.767	16.951	17.688	21.913	18.627
DF Arena 1B	43.793	40.137	42.994	35.333	42.139	17.598	12.442	13.292	33.381	43.42
Spectra-0	0.181	6.475	5.410	14.426	14.716	1.026	1.578	2.372	6.535	15.154
Spectra-AASIST	0.159	5.164	2.568	14.056	15.205	1.461	0.939	1.802	6.427	12.968
Spectra-AASIST3	0.723	4.506	1.998	13.82	15.187	0.961	0.727	1.806	6.502	14.481

Quickstart

Clone from Hugging Face

This repository is hosted on Hugging Face Hub: https://huggingface.co/lab260/spectra_aasist.

git lfs install
git clone https://huggingface.co/lab260/spectra_aasist
cd spectra_aasist

Install dependencies

pip install -U torch torchaudio transformers huggingface_hub safetensors soundfile

Single-file inference (example preprocessing)

import random
import torch
import torchaudio
import soundfile as sf

from model import spectra_aasist


def pad_random(x: torch.Tensor, max_len: int = 64600) -> torch.Tensor:
    # x: (num_samples,) or (1, num_samples)
    if x.ndim > 1:
        x = x.squeeze()
    x_len = x.shape[0]
    if x_len >= max_len:
        start = random.randint(0, x_len - max_len)
        return x[start:start + max_len]
    num_repeats = int(max_len / x_len) + 1
    return x.repeat(num_repeats)[:max_len]


def load_audio_mono(path: str) -> torch.Tensor:
    audio, sr = sf.read(path, dtype="float32")
    audio = torch.from_numpy(audio)
    if audio.ndim > 1:
        # (num_samples, channels) -> mono
        audio = audio.mean(dim=1)
    if sr != 16000:
        audio = torchaudio.functional.resample(audio, sr, 16000)
    return audio


device = "cuda" if torch.cuda.is_available() else "cpu"
model = spectra_aasist.from_pretrained(pretrained_model_name_or_path=".").eval().to(device)

audio = load_audio_mono("path/to/audio.wav")
audio = torchaudio.functional.preemphasis(audio.unsqueeze(0))  # (1, T)
audio = pad_random(audio.squeeze(0), 64600).unsqueeze(0)       # (1, 64600)

with torch.inference_mode():
    logits = model(audio.to(device))  # (1, 2)
    score_spoof = logits[0, 0].item()
    score_bonafide = logits[0, 1].item()

print({"score_bonafide": score_bonafide, "score_spoof": score_spoof})

Threshold-based classification (and how to tune it)

In model.py, the SpectraAASIST class provides classify() with a default threshold chosen as an “optimal” value for the original setting:

Default threshold: -1.140625 (it thresholds logit_bonafide = logits[:, 1])
Note: this threshold may not be optimal on a different dataset/domain. It’s recommended to tune the threshold on your dataset using EER (Equal Error Rate) or a target FAR/FRR.

Example:

with torch.inference_mode():
    pred = model.classify(audio.to(device), threshold=-1.140625)  # 1=bonafide, 0=spoof

Tuning the threshold via EER (typical workflow)

Run the model on a labeled set and collect scores for both classes.
Compute EER and the threshold

Limitations and notes

This is a pre-release model.
Significantly stronger models are planned for Q3–Q4 2026 — stay tuned.

License

MIT (see the license field in the model repo header).

Contacts

TG channel: https://t.me/korallll_ai email: k.n.borodin@mtuci.ru website: https://lab260.ru/