hjsgfd
/

deepfake_audio_classifier

Model card Files Files and versions

xet

Community

hjsgfd commited on Dec 7, 2025

Commit

66780fc

verified ·

1 Parent(s): f6bb345

Update README.md

Browse files

Files changed (1) hide show

README.md +251 -1

README.md CHANGED Viewed

@@ -1,5 +1,255 @@
 from predict_from_hf import AudioDeepfakeDetectorFromHF
 detector = AudioDeepfakeDetectorFromHF("hjsgfd/deepfake_audio_classifier")
 result = detector.predict("https://your-audio-file.wav", is_url=True)
-print(f"Prediction: {result['label']} ({result['confidence']:.1%})")

+# 🎵 Deepfake Audio Detection Model
+A machine learning model to detect deepfake/synthetic audio using Wav2Vec2 embeddings and classical ML classifiers.
+[![Hugging Face](https://img.shields.io/badge/🤗%20Hugging%20Face-Model-yellow)](https://huggingface.co/hjsgfd/deepfake_audio_classifier)
+[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
+## 📊 Model Performance
+| Model | Accuracy | Precision | Recall | F1-Score |
+|-------|----------|-----------|--------|----------|
+| **Logistic Regression** | **92.86%** | 0.95 | 0.93 | 0.93 |
+| SVM | 85.71% | 0.89 | 0.86 | 0.85 |
+| Random Forest | 78.57% | 0.85 | 0.79 | 0.76 |
+**Best Model: Logistic Regression with 92.86% accuracy**
+## 🎯 Approach
+### 1. Dataset
+- **Source**: [Real vs Fake Human Voice Deepfake Audio Dataset](https://huggingface.co/datasets/ud-nlp/real-vs-fake-human-voice-deepfake-audio)
+- **Size**: 70 audio samples
+- **Classes**: 5 classes (0, 1, 2, 3, 4)
+- **Distribution**: Perfectly balanced (14 samples per class)
+### 2. Feature Extraction
+We use **Wav2Vec2** (facebook/wav2vec2-base-960h) to extract deep audio embeddings:
+- Pre-trained self-supervised model
+- Extracts 768-dimensional feature vectors
+- Captures semantic audio information
+- Handles variable-length audio automatically
+**Pipeline:**
+```
+Audio File → Wav2Vec2 → 768-dim Embedding → Classifier → Prediction
+```
+### 3. Model Training
+Three classifiers were trained and compared:
+#### Logistic Regression (Best)
+- **Accuracy**: 92.86%
+- Multi-class classification with OvR strategy
+- Max iterations: 1000
+- Features: StandardScaler normalized
+#### SVM
+- **Accuracy**: 85.71%
+- RBF kernel
+- Probability estimates enabled
+#### Random Forest
+- **Accuracy**: 78.57%
+- 200 estimators
+- Parallel processing enabled
+### 4. Preprocessing
+- **Audio Loading**: Support for both URLs and local files
+- **Resampling**: All audio converted to 16kHz
+- **Stereo to Mono**: Averaged across channels
+- **Normalization**: StandardScaler on embeddings
+## 🚀 Quick Start
+### Installation
+```bash
+pip install transformers torch librosa soundfile scikit-learn huggingface-hub requests numpy
+```
+### Usage
+#### Simple Prediction
+```python
 from predict_from_hf import AudioDeepfakeDetectorFromHF
+# Initialize detector (downloads model automatically)
 detector = AudioDeepfakeDetectorFromHF("hjsgfd/deepfake_audio_classifier")
+# Predict from URL
 result = detector.predict("https://your-audio-file.wav", is_url=True)
+print(f"Prediction: {result['label']} ({result['confidence']:.1%})")
+```
+#### Batch Prediction
+```python
+from predict_from_hf import AudioDeepfakeDetectorFromHF
+detector = AudioDeepfakeDetectorFromHF("hjsgfd/deepfake_audio_classifier")
+# Multiple URLs
+audio_urls = [
+    "https://example.com/audio1.wav",
+    "https://example.com/audio2.wav",
+    "https://example.com/audio3.wav",
+]
+results = detector.predict_batch(audio_urls, are_urls=True)
+# Print results
+for result in results:
+    if 'prediction' in result:
+        print(f"{result['audio_source']}: {result['label']} ({result['confidence']:.1%})")
+```
+#### Local Files
+```python
+# Single file
+result = detector.predict("path/to/audio.wav", is_url=False)
+# Multiple files
+local_files = ["audio1.wav", "audio2.wav", "audio3.wav"]
+results = detector.predict_batch(local_files, are_urls=False)
+```
+## 📁 Model Files
+The model consists of three files hosted on Hugging Face:
+1. **deepfake_audio_classifier.pkl** - Trained Logistic Regression classifier
+2. **audio_scaler.pkl** - StandardScaler for feature normalization
+3. **model_metadata.json** - Model configuration and metadata
+```json
+{
+  "model_type": "LogisticRegression",
+  "accuracy": 0.9286,
+  "feature_extractor": "facebook/wav2vec2-base-960h",
+  "embedding_dim": 768,
+  "num_classes": 5,
+  "class_labels": {
+    "0": "class_0",
+    "1": "class_1",
+    "2": "class_2",
+    "3": "class_3",
+    "4": "class_4"
+  }
+}
+```
+## 📈 Detailed Results
+### Training Configuration
+- **Training Samples**: 56 (80%)
+- **Testing Samples**: 14 (20%)
+- **Feature Dimension**: 768
+- **Stratified Split**: Maintains class distribution
+### Logistic Regression Performance (Best Model)
+```
+              precision    recall  f1-score   support
+     class_0       1.00      0.67      0.80         3
+     class_1       1.00      1.00      1.00         2
+     class_2       1.00      1.00      1.00         3
+     class_3       0.75      1.00      0.86         3
+     class_4       1.00      1.00      1.00         3
+    accuracy                           0.93        14
+   macro avg       0.95      0.93      0.93        14
+weighted avg       0.95      0.93      0.93        14
+```
+### Key Metrics
+- **Macro Average Precision**: 0.95
+- **Macro Average Recall**: 0.93
+- **Macro Average F1-Score**: 0.93
+- **Overall Accuracy**: 92.86%
+## 🔧 Technical Details
+### Dependencies
+```
+transformers>=4.30.0
+torch>=2.0.0
+librosa>=0.10.0
+soundfile>=0.12.0
+scikit-learn>=1.3.0
+huggingface-hub>=0.16.0
+requests>=2.31.0
+numpy>=1.24.0
+```
+### Model Architecture
+```
+Input: Audio File (any format supported by soundfile)
+  ↓
+Preprocessing (16kHz, Mono)
+  ↓
+Wav2Vec2 Feature Extractor
+  ↓
+768-dimensional Embedding
+  ↓
+StandardScaler Normalization
+  ↓
+Logistic Regression Classifier
+  ↓
+Output: Class Prediction + Confidence Scores
+```
+### Supported Audio Formats
+- WAV
+- MP3
+- FLAC
+- OGG
+- M4A
+## 📊 Training Process
+1. **Data Loading**: Load dataset with disabled auto-decoding
+2. **Feature Extraction**: Extract Wav2Vec2 embeddings (768-dim vectors)
+3. **Train-Test Split**: 80-20 stratified split
+4. **Normalization**: StandardScaler on training data
+5. **Model Training**: Train 3 classifiers (LR, RF, SVM)
+6. **Evaluation**: Compare performance on test set
+7. **Selection**: Choose best model (Logistic Regression)
+8. **Export**: Save model, scaler, and metadata
+## 🎯 Use Cases
+- Deepfake audio detection
+- Voice authentication systems
+- Media verification tools
+- Forensic audio analysis
+- Content moderation platforms
+## 🤝 Contributing
+Contributions are welcome! Please feel free to submit a Pull Request.
+## 📝 Citation
+If you use this model, please cite:
+```bibtex
+@misc{deepfake_audio_classifier_2024,
+  author = {Your Name},
+  title = {Deepfake Audio Detection Model},
+  year = {2024},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/hjsgfd/deepfake_audio_classifier}}
+}
+```
+## 🙏 Acknowledgments
+- **Dataset**: [ud-nlp/real-vs-fake-human-voice-deepfake-audio](https://huggingface.co/datasets/ud-nlp/real-vs-fake-human-voice-deepfake-audio)
+- **Feature Extractor**: [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h)
+- **Transformers Library**: Hugging Face
+## 📧 Contact
+For questions or feedback, please open an issue on the repository.
+---
+**⚠️ Disclaimer**: This model is for research and educational purposes. Always verify critical audio authenticity through multiple methods.