AASIST / README.md

Add model card with arena badges

487a8c1 verified 1 day ago

3.62 kB

	---
	license: mit
	tags:
	- audio
	- anti-spoofing
	- audio-deepfake-detection
	- speech
	- asvspoof
	---

	# AASIST

	[![EER% 0.83 on ASVspoof2019_LA](https://img.shields.io/badge/EER%25%20on%20ASVspoof2019__LA-0.83%25-brightgreen)](https://huggingface.co/spaces/SpeechAntiSpoofingBenchmarks/SpeechAntiSpoofingArena?system=aasist)
	[![EER% 12.35 on ASVspoof2021_LA](https://img.shields.io/badge/EER%25%20on%20ASVspoof2021__LA-12.35%25-yellow)](https://huggingface.co/spaces/SpeechAntiSpoofingBenchmarks/SpeechAntiSpoofingArena?system=aasist)
	[![EER% 17.04 on ASVspoof2021_DF](https://img.shields.io/badge/EER%25%20on%20ASVspoof2021__DF-17.04%25-yellow)](https://huggingface.co/spaces/SpeechAntiSpoofingBenchmarks/SpeechAntiSpoofingArena?system=aasist)
	[![EER% 43.01 on InTheWild](https://img.shields.io/badge/EER%25%20on%20InTheWild-43.01%25-red)](https://huggingface.co/spaces/SpeechAntiSpoofingBenchmarks/SpeechAntiSpoofingArena?system=aasist)
	[![EER% 51.05 on CD-ADD](https://img.shields.io/badge/EER%25%20on%20CD--ADD-51.05%25-red)](https://huggingface.co/spaces/SpeechAntiSpoofingBenchmarks/SpeechAntiSpoofingArena?system=aasist)
	[![arena tier](https://img.shields.io/endpoint?url=https://speechantispoofingbenchmarks-speechantispoofingarena.hf.space/badge/aasist/tier.json)](https://huggingface.co/spaces/SpeechAntiSpoofingBenchmarks/SpeechAntiSpoofingArena?system=aasist)
	[![arena rank](https://img.shields.io/endpoint?url=https://speechantispoofingbenchmarks-speechantispoofingarena.hf.space/badge/aasist/rank.json)](https://huggingface.co/spaces/SpeechAntiSpoofingBenchmarks/SpeechAntiSpoofingArena?system=aasist)

	AASIST audio anti-spoofing (voice-deepfake detection) countermeasure from
	*"AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention
	Networks"* (Jung et al., ICASSP 2022). This is the official `AASIST` variant
	(not AASIST-L), using the upstream [clovaai/aasist](https://github.com/clovaai/aasist)
	ASVspoof2019 LA pretrained checkpoint. The model takes a raw speech waveform and
	returns a score where higher = more bona fide.

	- Code: https://github.com/clovaai/aasist
	- Paper: https://arxiv.org/abs/2110.01200
	- Parameters: 297,866 (0.298 M)
	- Checkpoint: [`AASIST.pth`](./AASIST.pth)

	This repo is self-contained for inference: the network definition is in
	[`_net.py`](./_net.py) and the exact wrapper used to produce the Arena scores in
	[`aasist.py`](./aasist.py).

	## Architecture

	AASIST operates directly on the raw waveform: a sinc-convolution front-end and a
	RawNet2-style residual encoder produce a spectro-temporal feature map, which is
	modelled by heterogeneous stacking graph attention layers over spectral and
	temporal sub-graphs with a learnable max/average readout, followed by a 2-class
	output (bona fide vs. spoof). The Arena score is the bona-fide logit.

	## Reproducing the Arena scores

	Inference uses a deterministic first-64600-sample window (no random crop),
	matching the upstream `data_utils.pad()` used at eval. Audio is provided as
	float32 mono at 16 kHz (no resampling in the wrapper).

	```python
	from aasist import AASIST
	m = AASIST(); m.load()
	scores = m.score_batch([wav], [16000]) # higher = more bona fide
	```

	\| Dataset \| EER % \| n_trials \|
	\|---------\|------:\|---------:\|
	\| ASVspoof2019_LA (in-domain) \| 0.83 \| 71,237 \|
	\| ASVspoof2021_LA \| 12.35 \| 181,566 \|
	\| ASVspoof2021_DF \| 17.04 \| 611,829 \|
	\| InTheWild \| 43.01 \| 31,779 \|
	\| CD-ADD \| 51.05 \| 20,786 \|

	The in-domain ASVspoof2019 LA result reproduces the paper's reported EER (~0.83%).

	## License

	MIT (inherited from clovaai/aasist; see [`LICENSE`](./LICENSE)).