YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

🎡 Deepfake Audio Detection Model

A machine learning model to detect deepfake/synthetic audio using Wav2Vec2 embeddings and classical ML classifiers.

Hugging Face Python 3.8+ License: MIT

πŸ“Š Model Performance

Model Accuracy Precision Recall F1-Score
Logistic Regression 92.86% 0.95 0.93 0.93
SVM 85.71% 0.89 0.86 0.85
Random Forest 78.57% 0.85 0.79 0.76

Best Model: Logistic Regression with 92.86% accuracy

🎯 Approach

1. Dataset

2. Feature Extraction

We use Wav2Vec2 (facebook/wav2vec2-base-960h) to extract deep audio embeddings:

  • Pre-trained self-supervised model
  • Extracts 768-dimensional feature vectors
  • Captures semantic audio information
  • Handles variable-length audio automatically

Pipeline:

Audio File β†’ Wav2Vec2 β†’ 768-dim Embedding β†’ Classifier β†’ Prediction

3. Model Training

Three classifiers were trained and compared:

Logistic Regression (Best)

  • Accuracy: 92.86%
  • Multi-class classification with OvR strategy
  • Max iterations: 1000
  • Features: StandardScaler normalized

SVM

  • Accuracy: 85.71%
  • RBF kernel
  • Probability estimates enabled

Random Forest

  • Accuracy: 78.57%
  • 200 estimators
  • Parallel processing enabled

4. Preprocessing

  • Audio Loading: Support for both URLs and local files
  • Resampling: All audio converted to 16kHz
  • Stereo to Mono: Averaged across channels
  • Normalization: StandardScaler on embeddings

πŸš€ Quick Start

Installation

pip install transformers torch librosa soundfile scikit-learn huggingface-hub requests numpy

Usage

Simple Prediction

from predict_from_hf import AudioDeepfakeDetectorFromHF

# Initialize detector (downloads model automatically)
detector = AudioDeepfakeDetectorFromHF("hjsgfd/deepfake_audio_classifier")

# Predict from URL
result = detector.predict("https://your-audio-file.wav", is_url=True)
print(f"Prediction: {result['label']} ({result['confidence']:.1%})")

Batch Prediction

from predict_from_hf import AudioDeepfakeDetectorFromHF

detector = AudioDeepfakeDetectorFromHF("hjsgfd/deepfake_audio_classifier")

# Multiple URLs
audio_urls = [
    "https://example.com/audio1.wav",
    "https://example.com/audio2.wav",
    "https://example.com/audio3.wav",
]

results = detector.predict_batch(audio_urls, are_urls=True)

# Print results
for result in results:
    if 'prediction' in result:
        print(f"{result['audio_source']}: {result['label']} ({result['confidence']:.1%})")

Local Files

# Single file
result = detector.predict("path/to/audio.wav", is_url=False)

# Multiple files
local_files = ["audio1.wav", "audio2.wav", "audio3.wav"]
results = detector.predict_batch(local_files, are_urls=False)

πŸ“ Model Files

The model consists of three files hosted on Hugging Face:

  1. deepfake_audio_classifier.pkl - Trained Logistic Regression classifier
  2. audio_scaler.pkl - StandardScaler for feature normalization
  3. model_metadata.json - Model configuration and metadata
{
  "model_type": "LogisticRegression",
  "accuracy": 0.9286,
  "feature_extractor": "facebook/wav2vec2-base-960h",
  "embedding_dim": 768,
  "num_classes": 5,
  "class_labels": {
    "0": "class_0",
    "1": "class_1",
    "2": "class_2",
    "3": "class_3",
    "4": "class_4"
  }
}

πŸ“ˆ Detailed Results

Training Configuration

  • Training Samples: 56 (80%)
  • Testing Samples: 14 (20%)
  • Feature Dimension: 768
  • Stratified Split: Maintains class distribution

Logistic Regression Performance (Best Model)

              precision    recall  f1-score   support

     class_0       1.00      0.67      0.80         3
     class_1       1.00      1.00      1.00         2
     class_2       1.00      1.00      1.00         3
     class_3       0.75      1.00      0.86         3
     class_4       1.00      1.00      1.00         3

    accuracy                           0.93        14
   macro avg       0.95      0.93      0.93        14
weighted avg       0.95      0.93      0.93        14

Key Metrics

  • Macro Average Precision: 0.95
  • Macro Average Recall: 0.93
  • Macro Average F1-Score: 0.93
  • Overall Accuracy: 92.86%

πŸ”§ Technical Details

Dependencies

transformers>=4.30.0
torch>=2.0.0
librosa>=0.10.0
soundfile>=0.12.0
scikit-learn>=1.3.0
huggingface-hub>=0.16.0
requests>=2.31.0
numpy>=1.24.0

Model Architecture

Input: Audio File (any format supported by soundfile)
  ↓
Preprocessing (16kHz, Mono)
  ↓
Wav2Vec2 Feature Extractor
  ↓
768-dimensional Embedding
  ↓
StandardScaler Normalization
  ↓
Logistic Regression Classifier
  ↓
Output: Class Prediction + Confidence Scores

Supported Audio Formats

  • WAV
  • MP3
  • FLAC
  • OGG
  • M4A

πŸ“Š Training Process

  1. Data Loading: Load dataset with disabled auto-decoding
  2. Feature Extraction: Extract Wav2Vec2 embeddings (768-dim vectors)
  3. Train-Test Split: 80-20 stratified split
  4. Normalization: StandardScaler on training data
  5. Model Training: Train 3 classifiers (LR, RF, SVM)
  6. Evaluation: Compare performance on test set
  7. Selection: Choose best model (Logistic Regression)
  8. Export: Save model, scaler, and metadata

🎯 Use Cases

  • Deepfake audio detection
  • Voice authentication systems
  • Media verification tools
  • Forensic audio analysis
  • Content moderation platforms

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“ Citation

If you use this model, please cite:

@misc{deepfake_audio_classifier_2024,
  author = {Your Name},
  title = {Deepfake Audio Detection Model},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/hjsgfd/deepfake_audio_classifier}}
}

πŸ™ Acknowledgments

πŸ“§ Contact

For questions or feedback, please open an issue on the repository.


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support