π§Ύ Model Card β Deepfake-Audio-Wav2Vec2
π§ Model Overview
Deepfake-Audio-Wav2Vec2 is a fine-tuned audio classification model trained to detect real vs spoofed (deepfake) speech audio.
The model is built on top of facebook/wav2vec2-base, a self-supervised speech representation model, and adapted for binary deepfake audio detection.
It learns subtle acoustic artifacts and synthetic speech patterns that differentiate genuine human recordings from AI-generated or manipulated audio samples.
This model is intended for:
- Deepfake voice detection
- Audio authenticity verification
- Research in anti-spoofing systems
- Security pipelines for voice-based applications
ποΈ Training Details
| Parameter | Value |
|---|---|
| Base Model | facebook/wav2vec2-base |
| Framework | Hugging Face Transformers |
| Training Hardware | GPU (CUDA) |
| Task Type | Audio Classification |
| Classes | bonafide / spoof |
| Audio Sample Rate | 16 kHz |
| Input Duration | Fixed audio segments |
| Optimization | AdamW |
| Loss Function | Cross Entropy |
π― Label Classes
- π’ Bonafide β Real / authentic speech
- π΄ Spoof β Deepfake / synthetic audio
π Evaluation Metrics
| Metric | Score |
|---|---|
| Accuracy | 92.8% |
| Precision | 89.7% |
| Recall | 88.0% |
| F1 Score | 88.4% |
β The model demonstrates strong detection performance across real and spoofed samples.
π Dataset Description
The model was trained on a balanced subset of the ASVspoof 2021 PA dataset for binary anti-spoofing classification.
The dataset includes:
- Genuine speech recordings
- Spoofed / manipulated audio samples
- Replay and synthetic attack scenarios
Training was performed on balanced class samples to improve robustness across both labels.
π» Example Usage
import torch
import torchaudio
import numpy as np
from transformers import AutoFeatureExtractor, AutoModelForAudioClassification
model_id = "Vansh180/deepfake-audio-wav2vec2"
feature_extractor = AutoFeatureExtractor.from_pretrained(model_id)
model = AutoModelForAudioClassification.from_pretrained(model_id)
model.eval()
def predict_audio(audio_path):
wav, sr = torchaudio.load(audio_path)
if wav.shape[0] > 1:
wav = wav.mean(dim=0, keepdim=True)
inputs = feature_extractor(
wav.squeeze().numpy(),
sampling_rate=16000,
return_tensors="pt"
)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)
prediction = torch.argmax(probs, dim=1).item()
confidence = probs[0][prediction].item()
return {
"prediction": model.config.id2label[prediction],
"confidence": confidence
}
print(predict_audio("sample.wav"))
- Downloads last month
- 39
Spaces using Vansh180/deepfake-audio-wav2vec2 2
Evaluation results
- Accuracy on Balanced ASVspoof 2021 PAself-reported0.928
- Precision on Balanced ASVspoof 2021 PAself-reported0.897
- Recall on Balanced ASVspoof 2021 PAself-reported0.880
- F1 Score on Balanced ASVspoof 2021 PAself-reported0.924