hjsgfd's picture
Update README.md
0fa3674 verified
# 🎡 Deepfake Audio Detection Model
A machine learning model to detect deepfake/synthetic audio using Wav2Vec2 embeddings and classical ML classifiers.
[![Hugging Face](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Model-yellow)](https://huggingface.co/hjsgfd/deepfake_audio_classifier)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
## πŸ“Š Model Performance
| Model | Accuracy | Precision | Recall | F1-Score |
|-------|----------|-----------|--------|----------|
| **Logistic Regression** | **92.86%** | 0.95 | 0.93 | 0.93 |
| SVM | 85.71% | 0.89 | 0.86 | 0.85 |
| Random Forest | 78.57% | 0.85 | 0.79 | 0.76 |
**Best Model: Logistic Regression with 92.86% accuracy**
## 🎯 Approach
### 1. Dataset
- **Source**: [Real vs Fake Human Voice Deepfake Audio Dataset](https://huggingface.co/datasets/ud-nlp/real-vs-fake-human-voice-deepfake-audio)
- **Size**: 70 audio samples
- **Classes**: 5 classes (0, 1, 2, 3, 4)
- **Distribution**: Perfectly balanced (14 samples per class)
### 2. Feature Extraction
We use **Wav2Vec2** (facebook/wav2vec2-base-960h) to extract deep audio embeddings:
- Pre-trained self-supervised model
- Extracts 768-dimensional feature vectors
- Captures semantic audio information
- Handles variable-length audio automatically
**Pipeline:**
```
Audio File β†’ Wav2Vec2 β†’ 768-dim Embedding β†’ Classifier β†’ Prediction
```
### 3. Model Training
Three classifiers were trained and compared:
#### Logistic Regression (Best)
- **Accuracy**: 92.86%
- Multi-class classification with OvR strategy
- Max iterations: 1000
- Features: StandardScaler normalized
#### SVM
- **Accuracy**: 85.71%
- RBF kernel
- Probability estimates enabled
#### Random Forest
- **Accuracy**: 78.57%
- 200 estimators
- Parallel processing enabled
### 4. Preprocessing
- **Audio Loading**: Support for both URLs and local files
- **Resampling**: All audio converted to 16kHz
- **Stereo to Mono**: Averaged across channels
- **Normalization**: StandardScaler on embeddings
## πŸš€ Quick Start
### Installation
```bash
pip install transformers torch librosa soundfile scikit-learn huggingface-hub requests numpy
```
### Usage
#### Simple Prediction
```python
from predict_from_hf import AudioDeepfakeDetectorFromHF
# Initialize detector (downloads model automatically)
detector = AudioDeepfakeDetectorFromHF("hjsgfd/deepfake_audio_classifier")
# Predict from URL
result = detector.predict("https://your-audio-file.wav", is_url=True)
print(f"Prediction: {result['label']} ({result['confidence']:.1%})")
```
#### Batch Prediction
```python
from predict_from_hf import AudioDeepfakeDetectorFromHF
detector = AudioDeepfakeDetectorFromHF("hjsgfd/deepfake_audio_classifier")
# Multiple URLs
audio_urls = [
"https://example.com/audio1.wav",
"https://example.com/audio2.wav",
"https://example.com/audio3.wav",
]
results = detector.predict_batch(audio_urls, are_urls=True)
# Print results
for result in results:
if 'prediction' in result:
print(f"{result['audio_source']}: {result['label']} ({result['confidence']:.1%})")
```
#### Local Files
```python
# Single file
result = detector.predict("path/to/audio.wav", is_url=False)
# Multiple files
local_files = ["audio1.wav", "audio2.wav", "audio3.wav"]
results = detector.predict_batch(local_files, are_urls=False)
```
## πŸ“ Model Files
The model consists of three files hosted on Hugging Face:
1. **deepfake_audio_classifier.pkl** - Trained Logistic Regression classifier
2. **audio_scaler.pkl** - StandardScaler for feature normalization
3. **model_metadata.json** - Model configuration and metadata
```json
{
"model_type": "LogisticRegression",
"accuracy": 0.9286,
"feature_extractor": "facebook/wav2vec2-base-960h",
"embedding_dim": 768,
"num_classes": 5,
"class_labels": {
"0": "class_0",
"1": "class_1",
"2": "class_2",
"3": "class_3",
"4": "class_4"
}
}
```
## πŸ“ˆ Detailed Results
### Training Configuration
- **Training Samples**: 56 (80%)
- **Testing Samples**: 14 (20%)
- **Feature Dimension**: 768
- **Stratified Split**: Maintains class distribution
### Logistic Regression Performance (Best Model)
```
precision recall f1-score support
class_0 1.00 0.67 0.80 3
class_1 1.00 1.00 1.00 2
class_2 1.00 1.00 1.00 3
class_3 0.75 1.00 0.86 3
class_4 1.00 1.00 1.00 3
accuracy 0.93 14
macro avg 0.95 0.93 0.93 14
weighted avg 0.95 0.93 0.93 14
```
### Key Metrics
- **Macro Average Precision**: 0.95
- **Macro Average Recall**: 0.93
- **Macro Average F1-Score**: 0.93
- **Overall Accuracy**: 92.86%
## πŸ”§ Technical Details
### Dependencies
```
transformers>=4.30.0
torch>=2.0.0
librosa>=0.10.0
soundfile>=0.12.0
scikit-learn>=1.3.0
huggingface-hub>=0.16.0
requests>=2.31.0
numpy>=1.24.0
```
### Model Architecture
```
Input: Audio File (any format supported by soundfile)
↓
Preprocessing (16kHz, Mono)
↓
Wav2Vec2 Feature Extractor
↓
768-dimensional Embedding
↓
StandardScaler Normalization
↓
Logistic Regression Classifier
↓
Output: Class Prediction + Confidence Scores
```
### Supported Audio Formats
- WAV
- MP3
- FLAC
- OGG
- M4A
## πŸ“Š Training Process
1. **Data Loading**: Load dataset with disabled auto-decoding
2. **Feature Extraction**: Extract Wav2Vec2 embeddings (768-dim vectors)
3. **Train-Test Split**: 80-20 stratified split
4. **Normalization**: StandardScaler on training data
5. **Model Training**: Train 3 classifiers (LR, RF, SVM)
6. **Evaluation**: Compare performance on test set
7. **Selection**: Choose best model (Logistic Regression)
8. **Export**: Save model, scaler, and metadata
## 🎯 Use Cases
- Deepfake audio detection
- Voice authentication systems
- Media verification tools
- Forensic audio analysis
- Content moderation platforms
## 🀝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## πŸ“ Citation
If you use this model, please cite:
```bibtex
@misc{deepfake_audio_classifier_2024,
author = {Your Name},
title = {Deepfake Audio Detection Model},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/hjsgfd/deepfake_audio_classifier}}
}
```
## πŸ™ Acknowledgments
- **Dataset**: [ud-nlp/real-vs-fake-human-voice-deepfake-audio](https://huggingface.co/datasets/ud-nlp/real-vs-fake-human-voice-deepfake-audio)
- **Feature Extractor**: [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h)
- **Transformers Library**: Hugging Face
## πŸ“§ Contact
For questions or feedback, please open an issue on the repository.
---