🧾 Model Card β€” Deepfake-Audio-Wav2Vec2


🧠 Model Overview

Deepfake-Audio-Wav2Vec2 is a fine-tuned audio classification model trained to detect real vs spoofed (deepfake) speech audio.

The model is built on top of facebook/wav2vec2-base, a self-supervised speech representation model, and adapted for binary deepfake audio detection.

It learns subtle acoustic artifacts and synthetic speech patterns that differentiate genuine human recordings from AI-generated or manipulated audio samples.

This model is intended for:

  • Deepfake voice detection
  • Audio authenticity verification
  • Research in anti-spoofing systems
  • Security pipelines for voice-based applications

πŸ—οΈ Training Details

Parameter Value
Base Model facebook/wav2vec2-base
Framework Hugging Face Transformers
Training Hardware GPU (CUDA)
Task Type Audio Classification
Classes bonafide / spoof
Audio Sample Rate 16 kHz
Input Duration Fixed audio segments
Optimization AdamW
Loss Function Cross Entropy

🎯 Label Classes

  • 🟒 Bonafide β†’ Real / authentic speech
  • πŸ”΄ Spoof β†’ Deepfake / synthetic audio

πŸ“Š Evaluation Metrics

Metric Score
Accuracy 92.8%
Precision 89.7%
Recall 88.0%
F1 Score 88.4%

βœ… The model demonstrates strong detection performance across real and spoofed samples.


πŸ“‚ Dataset Description

The model was trained on a balanced subset of the ASVspoof 2021 PA dataset for binary anti-spoofing classification.

The dataset includes:

  • Genuine speech recordings
  • Spoofed / manipulated audio samples
  • Replay and synthetic attack scenarios

Training was performed on balanced class samples to improve robustness across both labels.


πŸ’» Example Usage

import torch
import torchaudio
import numpy as np
from transformers import AutoFeatureExtractor, AutoModelForAudioClassification

model_id = "Vansh180/deepfake-audio-wav2vec2"

feature_extractor = AutoFeatureExtractor.from_pretrained(model_id)
model = AutoModelForAudioClassification.from_pretrained(model_id)

model.eval()

def predict_audio(audio_path):
    wav, sr = torchaudio.load(audio_path)

    if wav.shape[0] > 1:
        wav = wav.mean(dim=0, keepdim=True)

    inputs = feature_extractor(
        wav.squeeze().numpy(),
        sampling_rate=16000,
        return_tensors="pt"
    )

    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=1)

    prediction = torch.argmax(probs, dim=1).item()
    confidence = probs[0][prediction].item()

    return {
        "prediction": model.config.id2label[prediction],
        "confidence": confidence
    }

print(predict_audio("sample.wav"))
Downloads last month
39
Safetensors
Model size
94.6M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using Vansh180/deepfake-audio-wav2vec2 2

Evaluation results