sukhdeveyash commited on
Commit
afa0a32
·
verified ·
1 Parent(s): deccf56

Add model card with training config and results

Browse files
Files changed (1) hide show
  1. README.md +134 -0
README.md ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - audio
5
+ - deepfake-detection
6
+ - anti-spoofing
7
+ - wav2vec2
8
+ - xlsr
9
+ - speech
10
+ - asvspoof
11
+ datasets:
12
+ - asvspoof2019
13
+ - asvspoof2021
14
+ metrics:
15
+ - equal_error_rate
16
+ pipeline_tag: audio-classification
17
+ language:
18
+ - en
19
+ library_name: pytorch
20
+ ---
21
+
22
+ # XLS-R + SLS Classifier for Audio Deepfake Detection
23
+
24
+ Reproduction of **"Audio Deepfake Detection with XLS-R and SLS Classifier"** (Zhang et al., ACM Multimedia 2024).
25
+
26
+ The Selective Layer Summarization (SLS) classifier extracts attention-weighted features from all 24 transformer layers of [XLS-R 300M](https://huggingface.co/facebook/wav2vec2-xls-r-300m) (wav2vec 2.0), then classifies bonafide vs. spoofed speech via a lightweight fully-connected head. [RawBoost](https://arxiv.org/abs/2301.00693) (algo=3, SSI) data augmentation is applied during training.
27
+
28
+ ## Available Checkpoints
29
+
30
+ | File | Experiment | Description |
31
+ |------|-----------|-------------|
32
+ | `v1/epoch_2.pth` | v1 (baseline) | Best cross-domain generalization. Patience=1, no validation, 4 epochs. |
33
+ | `v2/epoch_16.pth` | v2 (val-based) | Validation early stopping. Patience=10, ASVspoof2019 LA dev validation, 27 epochs. |
34
+
35
+ **Recommended**: Use `v1/epoch_2.pth` — it generalizes better to unseen attack types (DF, In-the-Wild).
36
+
37
+ ## Results
38
+
39
+ | Track | Paper EER (%) | v1 EER (%) | v2 EER (%) |
40
+ |-------|--------------|------------|------------|
41
+ | ASVspoof 2021 DF | 1.92 | **2.14** | 3.75 |
42
+ | ASVspoof 2021 LA | 2.87 | 3.51 | **3.47** |
43
+ | In-the-Wild | 7.46 | **7.84** | 12.67 |
44
+
45
+ v1 closely reproduces the paper results. v2 improves LA slightly but degrades DF and In-the-Wild due to overfitting to the LA validation domain — a well-documented cross-domain generalization problem in audio deepfake detection ([Muller et al., Interspeech 2022](https://arxiv.org/abs/2203.16263)).
46
+
47
+ ## Training Configuration
48
+
49
+ Both experiments share the following setup:
50
+
51
+ | Parameter | Value |
52
+ |-----------|-------|
53
+ | Training data | ASVspoof2019 LA train (25,380 utterances) |
54
+ | Loss | Weighted Cross-Entropy [0.1, 0.9] |
55
+ | Optimizer | Adam (lr=1e-6, weight_decay=1e-4) |
56
+ | Batch size | 5 |
57
+ | RawBoost | algo=3 (SSI) |
58
+ | Seed | 1234 |
59
+ | SSL backbone | XLS-R 300M (frozen feature extractor) |
60
+ | GPU | NVIDIA RTX 4080 (16 GB) |
61
+
62
+ ### v1 specifics
63
+ - Early stopping: patience=1 on training loss
64
+ - No validation set
65
+ - 4 epochs trained, best at epoch 2 (train loss = 0.000661)
66
+
67
+ ### v2 specifics
68
+ - Early stopping: patience=10 on validation loss
69
+ - Validation: ASVspoof2019 LA dev (24,844 trials)
70
+ - 27 epochs trained, best at epoch 16 (val_loss = 0.000468, val_acc = 99.99%)
71
+ - Bug fixes: `torch.no_grad()` in validation loop, correct `best_val_loss` tracking
72
+
73
+ ## Usage
74
+
75
+ ### Download checkpoint
76
+
77
+ ```python
78
+ from huggingface_hub import hf_hub_download
79
+
80
+ # Download v1 checkpoint (recommended)
81
+ checkpoint_path = hf_hub_download(
82
+ repo_id="sukhdeveyash/XLS-R-SLS-Deepfake-Detection",
83
+ filename="v1/epoch_2.pth"
84
+ )
85
+
86
+ # Download v2 checkpoint
87
+ # checkpoint_path = hf_hub_download(
88
+ # repo_id="sukhdeveyash/XLS-R-SLS-Deepfake-Detection",
89
+ # filename="v2/epoch_16.pth"
90
+ # )
91
+ ```
92
+
93
+ ### Load and run inference
94
+
95
+ ```python
96
+ import torch
97
+ from model import Model # from the GitHub repo
98
+
99
+ device = "cuda" if torch.cuda.is_available() else "cpu"
100
+
101
+ model = Model(device=device, ssl_cpkt_path="xlsr2_300m.pt")
102
+ model.load_state_dict(torch.load(checkpoint_path, map_location=device))
103
+ model = model.to(device)
104
+ model.eval()
105
+ ```
106
+
107
+ Full training and evaluation code: [GitHub Repository](https://github.com/Yash-Sukhdeve/XLS-R-SLS-Deepfake-Detection)
108
+
109
+ ## Requirements
110
+
111
+ - Python 3.7+
112
+ - PyTorch 1.13.1 (CUDA 11.7)
113
+ - fairseq (commit a54021305d6b3c)
114
+ - XLS-R 300M base checkpoint (`xlsr2_300m.pt`) from [fairseq](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec/xlsr)
115
+
116
+ See `environment.yml` in the [GitHub repo](https://github.com/Yash-Sukhdeve/XLS-R-SLS-Deepfake-Detection) for the full environment.
117
+
118
+ ## Citation
119
+
120
+ ```bibtex
121
+ @inproceedings{zhang2024audio,
122
+ title={Audio Deepfake Detection with XLS-R and SLS Classifier},
123
+ author={Zhang, Qishan and Wen, Shuangbing and Hu, Tao},
124
+ booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
125
+ year={2024},
126
+ publisher={ACM}
127
+ }
128
+ ```
129
+
130
+ ## Acknowledgements
131
+
132
+ - [XLS-R](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec/xlsr) (Babu et al., 2022)
133
+ - [RawBoost](https://arxiv.org/abs/2301.00693) (Tak et al., Odyssey 2022)
134
+ - [ASVspoof Challenge](https://www.asvspoof.org/)