LCNN โ€” ASVspoof 2019 LA Countermeasure

Light CNN for binary classification of bonafide vs spoofed speech, trained on the ASVspoof 2019 Logical Access (LA) dataset.

This is one of three models compared in our study (LCNN, RawNet2, Wav2Vec 2.0) under identical training/evaluation conditions.

Architecture

  • 2D CNN over LFCC features
  • Reference: Lavrentyeva et al., "Audio Replay Attack Detection with Deep Learning Frameworks", Interspeech 2017
Input LFCC (60 coefficients, 512 FFT, 160 hop, ~4 s audio)
Channels [32, 48, 64, 128]
Kernel sizes [5, 5, 3, 3]
FC hidden 64
Dropout 0.3

See config.yaml for the full training/model configuration.

Training

  • Dataset: ASVspoof 2019 LA train split (~25k utterances)
  • Batch size: 128
  • Learning rate: 1e-4, cosine schedule
  • Gradient clipping: 1.0
  • Sample rate: 16 kHz mono
  • No data augmentation

Results (dev set)

Best checkpoint (epoch 18, selected by dev EER):

Metric Value
Dev EER 0.03%
Dev min CM-DCF 0.00027

Trajectory and loss curves: see learning_curves.png and metrics.csv.

Note on the metric: this repo reports a simplified countermeasure-only min-DCF (not the full ASVspoof 2019 tandem-DCF). Values are normalised by the cheaper trivial baseline, so 0 = perfect and 1 = no better than a constant classifier.

Caveat on dev performance: dev shares attacks (A01โ€“A06) with the training split. Eval-set performance against unseen attacks (A07โ€“A19) is the meaningful generalisation number; that evaluation has not yet been run for this checkpoint.

Usage

import torch
# Load checkpoint
state = torch.load("best.pt", map_location="cpu")
# Plug into the LCNN model from the source repo:
# https://github.com/sebastiaoteixeira/caa-ai-generated-speech-detector

License

MIT

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support