LCNN โ ASVspoof 2019 LA Countermeasure
Light CNN for binary classification of bonafide vs spoofed speech, trained on the ASVspoof 2019 Logical Access (LA) dataset.
This is one of three models compared in our study (LCNN, RawNet2, Wav2Vec 2.0) under identical training/evaluation conditions.
Architecture
- 2D CNN over LFCC features
- Reference: Lavrentyeva et al., "Audio Replay Attack Detection with Deep Learning Frameworks", Interspeech 2017
| Input | LFCC (60 coefficients, 512 FFT, 160 hop, ~4 s audio) |
| Channels | [32, 48, 64, 128] |
| Kernel sizes | [5, 5, 3, 3] |
| FC hidden | 64 |
| Dropout | 0.3 |
See config.yaml for the full training/model configuration.
Training
- Dataset: ASVspoof 2019 LA train split (~25k utterances)
- Batch size: 128
- Learning rate: 1e-4, cosine schedule
- Gradient clipping: 1.0
- Sample rate: 16 kHz mono
- No data augmentation
Results (dev set)
Best checkpoint (epoch 18, selected by dev EER):
| Metric | Value |
|---|---|
| Dev EER | 0.03% |
| Dev min CM-DCF | 0.00027 |
Trajectory and loss curves: see learning_curves.png and metrics.csv.
Note on the metric: this repo reports a simplified countermeasure-only min-DCF (not the full ASVspoof 2019 tandem-DCF). Values are normalised by the cheaper trivial baseline, so 0 = perfect and 1 = no better than a constant classifier.
Caveat on dev performance: dev shares attacks (A01โA06) with the training split. Eval-set performance against unseen attacks (A07โA19) is the meaningful generalisation number; that evaluation has not yet been run for this checkpoint.
Usage
import torch
# Load checkpoint
state = torch.load("best.pt", map_location="cpu")
# Plug into the LCNN model from the source repo:
# https://github.com/sebastiaoteixeira/caa-ai-generated-speech-detector
License
MIT
- Downloads last month
- 8