YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
π΅ Deepfake Audio Detection Model
A machine learning model to detect deepfake/synthetic audio using Wav2Vec2 embeddings and classical ML classifiers.
π Model Performance
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Logistic Regression | 92.86% | 0.95 | 0.93 | 0.93 |
| SVM | 85.71% | 0.89 | 0.86 | 0.85 |
| Random Forest | 78.57% | 0.85 | 0.79 | 0.76 |
Best Model: Logistic Regression with 92.86% accuracy
π― Approach
1. Dataset
- Source: Real vs Fake Human Voice Deepfake Audio Dataset
- Size: 70 audio samples
- Classes: 5 classes (0, 1, 2, 3, 4)
- Distribution: Perfectly balanced (14 samples per class)
2. Feature Extraction
We use Wav2Vec2 (facebook/wav2vec2-base-960h) to extract deep audio embeddings:
- Pre-trained self-supervised model
- Extracts 768-dimensional feature vectors
- Captures semantic audio information
- Handles variable-length audio automatically
Pipeline:
Audio File β Wav2Vec2 β 768-dim Embedding β Classifier β Prediction
3. Model Training
Three classifiers were trained and compared:
Logistic Regression (Best)
- Accuracy: 92.86%
- Multi-class classification with OvR strategy
- Max iterations: 1000
- Features: StandardScaler normalized
SVM
- Accuracy: 85.71%
- RBF kernel
- Probability estimates enabled
Random Forest
- Accuracy: 78.57%
- 200 estimators
- Parallel processing enabled
4. Preprocessing
- Audio Loading: Support for both URLs and local files
- Resampling: All audio converted to 16kHz
- Stereo to Mono: Averaged across channels
- Normalization: StandardScaler on embeddings
π Quick Start
Installation
pip install transformers torch librosa soundfile scikit-learn huggingface-hub requests numpy
Usage
Simple Prediction
from predict_from_hf import AudioDeepfakeDetectorFromHF
# Initialize detector (downloads model automatically)
detector = AudioDeepfakeDetectorFromHF("hjsgfd/deepfake_audio_classifier")
# Predict from URL
result = detector.predict("https://your-audio-file.wav", is_url=True)
print(f"Prediction: {result['label']} ({result['confidence']:.1%})")
Batch Prediction
from predict_from_hf import AudioDeepfakeDetectorFromHF
detector = AudioDeepfakeDetectorFromHF("hjsgfd/deepfake_audio_classifier")
# Multiple URLs
audio_urls = [
"https://example.com/audio1.wav",
"https://example.com/audio2.wav",
"https://example.com/audio3.wav",
]
results = detector.predict_batch(audio_urls, are_urls=True)
# Print results
for result in results:
if 'prediction' in result:
print(f"{result['audio_source']}: {result['label']} ({result['confidence']:.1%})")
Local Files
# Single file
result = detector.predict("path/to/audio.wav", is_url=False)
# Multiple files
local_files = ["audio1.wav", "audio2.wav", "audio3.wav"]
results = detector.predict_batch(local_files, are_urls=False)
π Model Files
The model consists of three files hosted on Hugging Face:
- deepfake_audio_classifier.pkl - Trained Logistic Regression classifier
- audio_scaler.pkl - StandardScaler for feature normalization
- model_metadata.json - Model configuration and metadata
{
"model_type": "LogisticRegression",
"accuracy": 0.9286,
"feature_extractor": "facebook/wav2vec2-base-960h",
"embedding_dim": 768,
"num_classes": 5,
"class_labels": {
"0": "class_0",
"1": "class_1",
"2": "class_2",
"3": "class_3",
"4": "class_4"
}
}
π Detailed Results
Training Configuration
- Training Samples: 56 (80%)
- Testing Samples: 14 (20%)
- Feature Dimension: 768
- Stratified Split: Maintains class distribution
Logistic Regression Performance (Best Model)
precision recall f1-score support
class_0 1.00 0.67 0.80 3
class_1 1.00 1.00 1.00 2
class_2 1.00 1.00 1.00 3
class_3 0.75 1.00 0.86 3
class_4 1.00 1.00 1.00 3
accuracy 0.93 14
macro avg 0.95 0.93 0.93 14
weighted avg 0.95 0.93 0.93 14
Key Metrics
- Macro Average Precision: 0.95
- Macro Average Recall: 0.93
- Macro Average F1-Score: 0.93
- Overall Accuracy: 92.86%
π§ Technical Details
Dependencies
transformers>=4.30.0
torch>=2.0.0
librosa>=0.10.0
soundfile>=0.12.0
scikit-learn>=1.3.0
huggingface-hub>=0.16.0
requests>=2.31.0
numpy>=1.24.0
Model Architecture
Input: Audio File (any format supported by soundfile)
β
Preprocessing (16kHz, Mono)
β
Wav2Vec2 Feature Extractor
β
768-dimensional Embedding
β
StandardScaler Normalization
β
Logistic Regression Classifier
β
Output: Class Prediction + Confidence Scores
Supported Audio Formats
- WAV
- MP3
- FLAC
- OGG
- M4A
π Training Process
- Data Loading: Load dataset with disabled auto-decoding
- Feature Extraction: Extract Wav2Vec2 embeddings (768-dim vectors)
- Train-Test Split: 80-20 stratified split
- Normalization: StandardScaler on training data
- Model Training: Train 3 classifiers (LR, RF, SVM)
- Evaluation: Compare performance on test set
- Selection: Choose best model (Logistic Regression)
- Export: Save model, scaler, and metadata
π― Use Cases
- Deepfake audio detection
- Voice authentication systems
- Media verification tools
- Forensic audio analysis
- Content moderation platforms
π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
π Citation
If you use this model, please cite:
@misc{deepfake_audio_classifier_2024,
author = {Your Name},
title = {Deepfake Audio Detection Model},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/hjsgfd/deepfake_audio_classifier}}
}
π Acknowledgments
- Dataset: ud-nlp/real-vs-fake-human-voice-deepfake-audio
- Feature Extractor: facebook/wav2vec2-base-960h
- Transformers Library: Hugging Face
π§ Contact
For questions or feedback, please open an issue on the repository.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support