--- license: mit tags: - audio - anti-spoofing - audio-deepfake-detection - speech - asvspoof --- # AASIST [![EER% 0.83 on ASVspoof2019_LA](https://img.shields.io/badge/EER%25%20on%20ASVspoof2019__LA-0.83%25-brightgreen)](https://huggingface.co/spaces/SpeechAntiSpoofingBenchmarks/SpeechAntiSpoofingArena?system=aasist) [![EER% 12.35 on ASVspoof2021_LA](https://img.shields.io/badge/EER%25%20on%20ASVspoof2021__LA-12.35%25-yellow)](https://huggingface.co/spaces/SpeechAntiSpoofingBenchmarks/SpeechAntiSpoofingArena?system=aasist) [![EER% 17.04 on ASVspoof2021_DF](https://img.shields.io/badge/EER%25%20on%20ASVspoof2021__DF-17.04%25-yellow)](https://huggingface.co/spaces/SpeechAntiSpoofingBenchmarks/SpeechAntiSpoofingArena?system=aasist) [![EER% 43.01 on InTheWild](https://img.shields.io/badge/EER%25%20on%20InTheWild-43.01%25-red)](https://huggingface.co/spaces/SpeechAntiSpoofingBenchmarks/SpeechAntiSpoofingArena?system=aasist) [![EER% 51.05 on CD-ADD](https://img.shields.io/badge/EER%25%20on%20CD--ADD-51.05%25-red)](https://huggingface.co/spaces/SpeechAntiSpoofingBenchmarks/SpeechAntiSpoofingArena?system=aasist) [![arena tier](https://img.shields.io/endpoint?url=https://speechantispoofingbenchmarks-speechantispoofingarena.hf.space/badge/aasist/tier.json)](https://huggingface.co/spaces/SpeechAntiSpoofingBenchmarks/SpeechAntiSpoofingArena?system=aasist) [![arena rank](https://img.shields.io/endpoint?url=https://speechantispoofingbenchmarks-speechantispoofingarena.hf.space/badge/aasist/rank.json)](https://huggingface.co/spaces/SpeechAntiSpoofingBenchmarks/SpeechAntiSpoofingArena?system=aasist) AASIST audio anti-spoofing (voice-deepfake detection) countermeasure from *"AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"* (Jung et al., ICASSP 2022). This is the **official `AASIST` variant** (not AASIST-L), using the upstream [clovaai/aasist](https://github.com/clovaai/aasist) ASVspoof2019 LA pretrained checkpoint. The model takes a raw speech waveform and returns a score where **higher = more bona fide**. - **Code:** https://github.com/clovaai/aasist - **Paper:** https://arxiv.org/abs/2110.01200 - **Parameters:** 297,866 (0.298 M) - **Checkpoint:** [`AASIST.pth`](./AASIST.pth) This repo is self-contained for inference: the network definition is in [`_net.py`](./_net.py) and the exact wrapper used to produce the Arena scores in [`aasist.py`](./aasist.py). ## Architecture AASIST operates directly on the raw waveform: a sinc-convolution front-end and a RawNet2-style residual encoder produce a spectro-temporal feature map, which is modelled by heterogeneous stacking graph attention layers over spectral and temporal sub-graphs with a learnable max/average readout, followed by a 2-class output (bona fide vs. spoof). The Arena score is the bona-fide logit. ## Reproducing the Arena scores Inference uses a deterministic first-64600-sample window (no random crop), matching the upstream `data_utils.pad()` used at eval. Audio is provided as float32 mono at 16 kHz (no resampling in the wrapper). ```python from aasist import AASIST m = AASIST(); m.load() scores = m.score_batch([wav], [16000]) # higher = more bona fide ``` | Dataset | EER % | n_trials | |---------|------:|---------:| | ASVspoof2019_LA (in-domain) | 0.83 | 71,237 | | ASVspoof2021_LA | 12.35 | 181,566 | | ASVspoof2021_DF | 17.04 | 611,829 | | InTheWild | 43.01 | 31,779 | | CD-ADD | 51.05 | 20,786 | The in-domain ASVspoof2019 LA result reproduces the paper's reported EER (~0.83%). ## License MIT (inherited from clovaai/aasist; see [`LICENSE`](./LICENSE)).