wh1tet3a commited on
Commit
366f7db
·
verified ·
1 Parent(s): afbebe3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -115
README.md CHANGED
@@ -1,115 +1,115 @@
1
- ---
2
- license: mit
3
- library_name: pytorch
4
- tags:
5
- - audio
6
- - spoofing-detection
7
- - anti-spoofing
8
- - wav2vec2
9
- - ecapa-tdnn
10
- ---
11
-
12
- ## Model Card: Spectra-0 (anti-spoofing / bonafide vs spoof)
13
-
14
- `Spectra-0` is a model for **speech spoofing detection** (binary classification: `bonafide` vs `spoof`) from **raw audio waveforms**. Architecture: SSL encoder (`Wav2Vec2`) → MLP projection → `ECAPA-TDNN` 2-class classifier.
15
-
16
- - **Input**: waveform \(float32\), shape `(batch, num_samples)` (typically 16 kHz).
17
- - **Output**: logits of shape `(batch, 2)`, where **index 0 = spoof**, **index 1 = bonafide**.
18
-
19
- On first run, the model will automatically download the SSL encoder `facebook/wav2vec2-xls-r-300m` via `transformers`.
20
-
21
- ## Quickstart
22
-
23
- ### Clone from Hugging Face
24
-
25
- This repository is hosted on Hugging Face Hub: `https://huggingface.co/MTUCI/spectra_0`.
26
-
27
- ```bash
28
- git lfs install
29
- git clone https://huggingface.co/MTUCI/spectra_0
30
- cd spectra_0
31
- ```
32
-
33
- ### Install dependencies
34
-
35
- ```bash
36
- pip install -U torch torchaudio transformers huggingface_hub safetensors soundfile
37
- ```
38
-
39
- ### Single-file inference (example preprocessing)
40
-
41
- ```python
42
- import random
43
- import torch
44
- import torchaudio
45
- import soundfile as sf
46
-
47
- from model import spectra_0
48
-
49
-
50
- def pad_random(x: torch.Tensor, max_len: int = 64600) -> torch.Tensor:
51
- # x: (num_samples,) or (1, num_samples)
52
- if x.ndim > 1:
53
- x = x.squeeze()
54
- x_len = x.shape[0]
55
- if x_len >= max_len:
56
- start = random.randint(0, x_len - max_len)
57
- return x[start:start + max_len]
58
- num_repeats = int(max_len / x_len) + 1
59
- return x.repeat(num_repeats)[:max_len]
60
-
61
-
62
- def load_audio_mono(path: str) -> torch.Tensor:
63
- audio, sr = sf.read(path, dtype="float32")
64
- audio = torch.from_numpy(audio)
65
- if audio.ndim > 1:
66
- # (num_samples, channels) -> mono
67
- audio = audio.mean(dim=1)
68
- if sr != 16000:
69
- audio = torchaudio.functional.resample(audio, sr, 16000)
70
- return audio
71
-
72
-
73
- device = "cuda" if torch.cuda.is_available() else "cpu"
74
- model = spectra_0.from_pretrained(pretrained_model_name_or_path=".").eval().to(device)
75
-
76
- audio = load_audio_mono("path/to/audio.wav")
77
- audio = torchaudio.functional.preemphasis(audio.unsqueeze(0)) # (1, T)
78
- audio = pad_random(audio.squeeze(0), 64600).unsqueeze(0) # (1, 64600)
79
-
80
- with torch.inference_mode():
81
- logits = model(audio.to(device)) # (1, 2)
82
- score_spoof = logits[0, 0].item()
83
- score_bonafide = logits[0, 1].item()
84
-
85
- print({"score_bonafide": score_bonafide, "score_spoof": score_spoof})
86
- ```
87
-
88
- ## Threshold-based classification (and how to tune it)
89
-
90
- In `model.py`, the `Spectra0Model` class provides `classify()` with a **default threshold** chosen as an “optimal” value for the original setting:
91
-
92
- - **Default threshold**: `-1.0625009` (it thresholds `logit_bonafide = logits[:, 1]`)
93
- - **Note**: this threshold **may not be optimal** on a different dataset/domain. It’s recommended to tune the threshold on your dataset using **EER** (Equal Error Rate) or a target FAR/FRR.
94
-
95
- Example:
96
-
97
- ```python
98
- with torch.inference_mode():
99
- pred = model.classify(audio.to(device), threshold=-1.0625009) # 1=bonafide, 0=spoof
100
- ```
101
-
102
- ### Tuning the threshold via EER (typical workflow)
103
-
104
- 1) Run the model on a labeled set and collect scores for both classes (e.g., store `score_bonafide = logits[:, 1]` for each sample).
105
-
106
- 2) Compute EER and the threshold
107
-
108
- ## Limitations and notes
109
-
110
- - This is a **pre-release** model.
111
- - Significantly stronger models are planned for **Q3–Q4 2026** — stay tuned.
112
-
113
- ## License
114
-
115
- MIT (see the `license` field in the model repo header).
 
1
+ ---
2
+ license: mit
3
+ library_name: pytorch
4
+ tags:
5
+ - audio
6
+ - spoofing-detection
7
+ - anti-spoofing
8
+ - wav2vec2
9
+ - ecapa-tdnn
10
+ ---
11
+
12
+ ## Model Card: Spectra-0 (anti-spoofing / bonafide vs spoof)
13
+
14
+ `Spectra-0` is a model for **speech spoofing detection** (binary classification: `bonafide` vs `spoof`) from **raw audio waveforms**. Architecture: SSL encoder (`Wav2Vec2`) → MLP projection → `ECAPA-TDNN` 2-class classifier.
15
+
16
+ - **Input**: waveform \(float32\), shape `(batch, num_samples)` (typically 16 kHz).
17
+ - **Output**: logits of shape `(batch, 2)`, where **index 0 = spoof**, **index 1 = bonafide**.
18
+
19
+ On first run, the model will automatically download the SSL encoder `facebook/wav2vec2-xls-r-300m` via `transformers`.
20
+
21
+ ## Quickstart
22
+
23
+ ### Clone from Hugging Face
24
+
25
+ This repository is hosted on Hugging Face Hub: `https://huggingface.co/MTUCI/spectra_0`.
26
+
27
+ ```bash
28
+ git lfs install
29
+ git clone https://huggingface.co/MTUCI/spectra_0
30
+ cd spectra_0
31
+ ```
32
+
33
+ ### Install dependencies
34
+
35
+ ```bash
36
+ pip install -U torch torchaudio transformers huggingface_hub safetensors soundfile
37
+ ```
38
+
39
+ ### Single-file inference (example preprocessing)
40
+
41
+ ```python
42
+ import random
43
+ import torch
44
+ import torchaudio
45
+ import soundfile as sf
46
+
47
+ from model import spectra_0
48
+
49
+
50
+ def pad_random(x: torch.Tensor, max_len: int = 64600) -> torch.Tensor:
51
+ # x: (num_samples,) or (1, num_samples)
52
+ if x.ndim > 1:
53
+ x = x.squeeze()
54
+ x_len = x.shape[0]
55
+ if x_len >= max_len:
56
+ start = random.randint(0, x_len - max_len)
57
+ return x[start:start + max_len]
58
+ num_repeats = int(max_len / x_len) + 1
59
+ return x.repeat(num_repeats)[:max_len]
60
+
61
+
62
+ def load_audio_mono(path: str) -> torch.Tensor:
63
+ audio, sr = sf.read(path, dtype="float32")
64
+ audio = torch.from_numpy(audio)
65
+ if audio.ndim > 1:
66
+ # (num_samples, channels) -> mono
67
+ audio = audio.mean(dim=1)
68
+ if sr != 16000:
69
+ audio = torchaudio.functional.resample(audio, sr, 16000)
70
+ return audio
71
+
72
+
73
+ device = "cuda" if torch.cuda.is_available() else "cpu"
74
+ model = spectra_0.from_pretrained(pretrained_model_name_or_path=".").eval().to(device)
75
+
76
+ audio = load_audio_mono("path/to/audio.wav")
77
+ audio = torchaudio.functional.preemphasis(audio.unsqueeze(0)) # (1, T)
78
+ audio = pad_random(audio.squeeze(0), 64600).unsqueeze(0) # (1, 64600)
79
+
80
+ with torch.inference_mode():
81
+ logits = model(audio.to(device)) # (1, 2)
82
+ score_spoof = logits[0, 0].item()
83
+ score_bonafide = logits[0, 1].item()
84
+
85
+ print({"score_bonafide": score_bonafide, "score_spoof": score_spoof})
86
+ ```
87
+
88
+ ## Threshold-based classification (and how to tune it)
89
+
90
+ In `model.py`, the `Spectra0Model` class provides `classify()` with a **default threshold** chosen as an “optimal” value for the original setting:
91
+
92
+ - **Default threshold**: `-1.0625009` (it thresholds `logit_bonafide = logits[:, 1]`)
93
+ - **Note**: this threshold **may not be optimal** on a different dataset/domain. It’s recommended to tune the threshold on your dataset using **EER** (Equal Error Rate) or a target FAR/FRR.
94
+
95
+ Example:
96
+
97
+ ```python
98
+ with torch.inference_mode():
99
+ pred = model.classify(audio.to(device), threshold=-1.0625009) # 1=bonafide, 0=spoof
100
+ ```
101
+
102
+ ### Tuning the threshold via EER (typical workflow)
103
+
104
+ 1) Run the model on a labeled set and collect scores for both classes.
105
+
106
+ 2) Compute EER and the threshold
107
+
108
+ ## Limitations and notes
109
+
110
+ - This is a **pre-release** model.
111
+ - Significantly stronger models are planned for **Q3–Q4 2026** — stay tuned.
112
+
113
+ ## License
114
+
115
+ MIT (see the `license` field in the model repo header).