| system: |
| name: "Nes2Net" |
| slug: "nes2net" |
| description: > |
| wav2vec 2.0 (XLS-R 300M) self-supervised front-end fine-tuned end-to-end with |
| a Nes2Net-X (Nested Res2Net TDNN) back-end for speech anti-spoofing. The |
| nested Res2Net structure couples multi-scale residual groups with squeeze- |
| excitation, replacing dimensionality-reducing necks; mean temporal pooling + |
| linear classifier. Only ~0.51M back-end params. Official Nes2Net-X single |
| checkpoint (ASVspoof2021 LA 1.73% / DF 1.65% EER as reported), trained on |
| ASVspoof2019 LA with RawBoost, FP32, deterministic first-64600-sample window |
| (no random crop). |
| code: "https://github.com/Liu-Tianchi/Nes2Net_ASVspoof_ITW" |
| checkpoint: "https://huggingface.co/SpeechAntiSpoofingBenchmarks/Nes2Net" |
| params_millions: 317.9026 |
| paper: |
| arxiv_id: "2504.05657" |
| url: "https://arxiv.org/abs/2504.05657" |
| bibtex: | |
| @article{Nes2Net, |
| author={Liu, Tianchi and Truong, Duc-Tuan and Das, Rohan Kumar and Lee, Kong Aik and Li, Haizhou}, |
| journal={IEEE Transactions on Information Forensics and Security}, |
| title={Nes2Net: A Lightweight Nested Architecture for Foundation Model Driven Speech Anti-Spoofing}, |
| year={2025}, |
| volume={20}, |
| pages={12005--12018}, |
| doi={10.1109/TIFS.2025.3626963} |
| } |
| notes: > |
| XLS-R 300M (wav2vec 2.0) front-end + Nes2Net-X back-end, the single (non-averaged) |
| checkpoint from Liu-Tianchi/Nes2Net_ASVspoof_ITW (Nes_ratio [8,8], SE_ratio [1], |
| pool_func 'mean', dilation 2). Architecture is built from the base xlsr2_300m.pt |
| model config, then every weight is overwritten by the fine-tuned checkpoint. |
| Deterministic first-64600-sample window (no random crop), matching the source |
| data_utils_SSL.py::pad used at eval (default --test_protocol 4sec). score = output |
| logit for class 1 (bona fide); higher = more bona fide. Back-end params ~0.51M; |
| params_millions reports the full deployed model incl. the XLS-R front-end. |
| |