Update README.md

fae1630 verified 8 days ago

7.1 kB

	---
	library_name: pytorch
	tags:
	- audio
	- spoofing-detection
	- anti-spoofing
	- wav2vec2
	- aasist
	license: apache-2.0
	pipeline_tag: audio-classification
	model-index:
	- name: spectra_aasist
	results:
	- task:
	type: Speech Antispoofing
	dataset:
	name: ASVspoof19_LA
	type: ASVspoof19_LA
	metrics:
	- name: Equal Error Rate
	type: Equal Error Rate
	value: 0.159
	- task:
	type: Speech Antispoofing
	dataset:
	name: ASVspoof21_LA
	type: ASVspoof21_LA
	metrics:
	- name: Equal Error Rate
	type: Equal Error Rate
	value: 5.164
	- task:
	type: Speech Antispoofing
	dataset:
	name: ASVspoof21_DF
	type: ASVspoof21_DF
	metrics:
	- name: Equal Error Rate
	type: Equal Error Rate
	value: 2.568
	- task:
	type: Speech Antispoofing
	dataset:
	name: ASVspoof5
	type: ASVspoof5
	metrics:
	- name: Equal Error Rate
	type: Equal Error Rate
	value: 14.056
	- task:
	type: Speech Antispoofing
	dataset:
	name: ADD2022
	type: ADD2022
	metrics:
	- name: Equal Error Rate
	type: Equal Error Rate
	value: 15.205
	- task:
	type: Speech Antispoofing
	dataset:
	name: In-the-Wild
	type: In-the-Wild
	metrics:
	- name: Equal Error Rate
	type: Equal Error Rate
	value: 1.461
	- task:
	type: Speech Antispoofing
	dataset:
	name: AD2R1
	type: AD2R1
	metrics:
	- name: Equal Error Rate
	type: Equal Error Rate
	value: 0.939
	- task:
	type: Speech Antispoofing
	dataset:
	name: AD2R2
	type: AD2R2
	metrics:
	- name: Equal Error Rate
	type: Equal Error Rate
	value: 1.802
	- task:
	type: Speech Antispoofing
	dataset:
	name: AD3R1
	type: AD3R1
	metrics:
	- name: Equal Error Rate
	type: Equal Error Rate
	value: 6.502
	- task:
	type: Speech Antispoofing
	dataset:
	name: AD3R2
	type: AD3R2
	metrics:
	- name: Equal Error Rate
	type: Equal Error Rate
	value: 14.481

	---

	## Model Card: Spectra-AASIST (anti-spoofing / bonafide vs spoof)

	`Spectra-AASIST` is a model for speech spoofing detection (binary classification: `bonafide` vs `spoof`) from raw audio waveforms. Architecture: SSL encoder (`Wav2Vec2`) → MLP projection → `AASIST` 2-class classifier.

	- Input: waveform \(float32\), shape `(batch, num_samples)` (typically 16 kHz).
	- Output: logits of shape `(batch, 2)`, where index 0 = spoof, index 1 = bonafide.

	On first run, the model will automatically download the SSL encoder `facebook/wav2vec2-xls-r-300m` via `transformers`.

	## Evaluation Results

	\| Model \| ASVspoof19 LA \| ASVspoof21 LA \| ASVspoof21 DF \| ASVspoof5 \| ADD2022 \| In-the-Wild \| AD2R1 \| AD2R2 \| AD3R1 \| AD2R2 \|
	\|-----------\|--------\|--------\|--------\|--------\|--------\|--------\|--------\|--------\|--------\|--------\|
	\| [Res2TCNGuard](https://github.com/mtuciru/Res2TCNGuard) \| 7.487 \| 19.130 \| 19.883 \| 37.620 \| 49.538 \| 49.246 \| 34.683 \| 35.343 \| 48.051 \| 39.558 \|
	\| [AASIST3](https://huggingface.co/lab260/AASIST3) \| 27.585 \| 37.407 \| 33.099 \| 41.001 \| 47.192 \| 39.626 \| 36.581 \| 37.351 \| 41.333 \| 44.278 \|
	\| [XSLS](https://github.com/QiShanZhang/SLSforASVspoof-2021-DF) \| 0.231 \| 7.714 \| 4.220 \| 17.688 \| 33.951 \| 7.453 \| 14.386 \| 15.743 \| 19.368 \| 21.095 \|
	\| [TCM-ADD](https://github.com/ductuantruong/tcm_add) \| 0.152 \| 6.655 \| 3.444 \| 19.505 \| 35.252 \| 7.767 \| 16.951 \| 17.688 \| 21.913 \| 18.627 \|
	\| [DF Arena 1B](https://huggingface.co/Speech-Arena-2025/DF_Arena_1B_V_1) \| 43.793 \| 40.137 \| 42.994 \| 35.333 \| 42.139 \| 17.598 \| 12.442 \| 13.292 \| 33.381 \| 43.42 \|
	\| [Spectra-0](https://huggingface.co/lab260/spectra_0) \| 0.181 \| 6.475 \| 5.410 \| 14.426 \| 14.716 \| 1.026 \| 1.578 \| 2.372 \| 6.535 \| 15.154 \|
	\| Spectra-AASIST \| 0.159 \| 5.164 \| 2.568 \| 14.056 \| 15.205 \| 1.461 \| 0.939 \| 1.802 \| 6.427 \| 12.968 \|
	\| [Spectra-AASIST3](https://huggingface.co/lab260/Spectra-AASIST3) \| 0.723 \| 4.506 \| 1.998 \| 13.82 \| 15.187 \| 0.961 \| 0.727 \| 1.806 \| 6.502 \| 14.481 \|


	## Quickstart

	### Clone from Hugging Face

	This repository is hosted on Hugging Face Hub: `https://huggingface.co/lab260/spectra_aasist`.

	```bash
	git lfs install
	git clone https://huggingface.co/lab260/spectra_aasist
	cd spectra_aasist
	```

	### Install dependencies

	```bash
	pip install -U torch torchaudio transformers huggingface_hub safetensors soundfile
	```

	### Single-file inference (example preprocessing)

	```python
	import random
	import torch
	import torchaudio
	import soundfile as sf

	from model import spectra_aasist


	def pad_random(x: torch.Tensor, max_len: int = 64600) -> torch.Tensor:
	# x: (num_samples,) or (1, num_samples)
	if x.ndim > 1:
	x = x.squeeze()
	x_len = x.shape[0]
	if x_len >= max_len:
	start = random.randint(0, x_len - max_len)
	return x[start:start + max_len]
	num_repeats = int(max_len / x_len) + 1
	return x.repeat(num_repeats)[:max_len]


	def load_audio_mono(path: str) -> torch.Tensor:
	audio, sr = sf.read(path, dtype="float32")
	audio = torch.from_numpy(audio)
	if audio.ndim > 1:
	# (num_samples, channels) -> mono
	audio = audio.mean(dim=1)
	if sr != 16000:
	audio = torchaudio.functional.resample(audio, sr, 16000)
	return audio


	device = "cuda" if torch.cuda.is_available() else "cpu"
	model = spectra_aasist.from_pretrained(pretrained_model_name_or_path=".").eval().to(device)

	audio = load_audio_mono("path/to/audio.wav")
	audio = torchaudio.functional.preemphasis(audio.unsqueeze(0)) # (1, T)
	audio = pad_random(audio.squeeze(0), 64600).unsqueeze(0) # (1, 64600)

	with torch.inference_mode():
	logits = model(audio.to(device)) # (1, 2)
	score_spoof = logits[0, 0].item()
	score_bonafide = logits[0, 1].item()

	print({"score_bonafide": score_bonafide, "score_spoof": score_spoof})
	```

	## Threshold-based classification (and how to tune it)

	In `model.py`, the `SpectraAASIST` class provides `classify()` with a default threshold chosen as an “optimal” value for the original setting:

	- Default threshold: `-1.140625` (it thresholds `logit_bonafide = logits[:, 1]`)
	- Note: this threshold may not be optimal on a different dataset/domain. It’s recommended to tune the threshold on your dataset using EER (Equal Error Rate) or a target FAR/FRR.

	Example:

	```python
	with torch.inference_mode():
	pred = model.classify(audio.to(device), threshold=-1.140625) # 1=bonafide, 0=spoof
	```

	### Tuning the threshold via EER (typical workflow)

	1) Run the model on a labeled set and collect scores for both classes.

	2) Compute EER and the threshold

	## Limitations and notes

	- This is a pre-release model.
	- Significantly stronger models are planned for Q3–Q4 2026 — stay tuned.

	## License

	MIT (see the `license` field in the model repo header).

	## Contacts

	TG channel: https://t.me/korallll_ai
	email: k.n.borodin@mtuci.ru
	website: https://lab260.ru/