Update README.md

0fa3674 verified 25 days ago

6.97 kB

	# 🎵 Deepfake Audio Detection Model

	A machine learning model to detect deepfake/synthetic audio using Wav2Vec2 embeddings and classical ML classifiers.

	[![Hugging Face](https://img.shields.io/badge/🤗%20Hugging%20Face-Model-yellow)](https://huggingface.co/hjsgfd/deepfake_audio_classifier)
	[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
	[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)

	## 📊 Model Performance

	\| Model \| Accuracy \| Precision \| Recall \| F1-Score \|
	\|-------\|----------\|-----------\|--------\|----------\|
	\| Logistic Regression \| 92.86% \| 0.95 \| 0.93 \| 0.93 \|
	\| SVM \| 85.71% \| 0.89 \| 0.86 \| 0.85 \|
	\| Random Forest \| 78.57% \| 0.85 \| 0.79 \| 0.76 \|

	Best Model: Logistic Regression with 92.86% accuracy

	## 🎯 Approach

	### 1. Dataset
	- Source: [Real vs Fake Human Voice Deepfake Audio Dataset](https://huggingface.co/datasets/ud-nlp/real-vs-fake-human-voice-deepfake-audio)
	- Size: 70 audio samples
	- Classes: 5 classes (0, 1, 2, 3, 4)
	- Distribution: Perfectly balanced (14 samples per class)

	### 2. Feature Extraction
	We use Wav2Vec2 (facebook/wav2vec2-base-960h) to extract deep audio embeddings:
	- Pre-trained self-supervised model
	- Extracts 768-dimensional feature vectors
	- Captures semantic audio information
	- Handles variable-length audio automatically

	Pipeline:
	```
	Audio File → Wav2Vec2 → 768-dim Embedding → Classifier → Prediction
	```

	### 3. Model Training
	Three classifiers were trained and compared:

	#### Logistic Regression (Best)
	- Accuracy: 92.86%
	- Multi-class classification with OvR strategy
	- Max iterations: 1000
	- Features: StandardScaler normalized

	#### SVM
	- Accuracy: 85.71%
	- RBF kernel
	- Probability estimates enabled

	#### Random Forest
	- Accuracy: 78.57%
	- 200 estimators
	- Parallel processing enabled

	### 4. Preprocessing
	- Audio Loading: Support for both URLs and local files
	- Resampling: All audio converted to 16kHz
	- Stereo to Mono: Averaged across channels
	- Normalization: StandardScaler on embeddings

	## 🚀 Quick Start

	### Installation
	```bash
	pip install transformers torch librosa soundfile scikit-learn huggingface-hub requests numpy
	```

	### Usage

	#### Simple Prediction
	```python
	from predict_from_hf import AudioDeepfakeDetectorFromHF

	# Initialize detector (downloads model automatically)
	detector = AudioDeepfakeDetectorFromHF("hjsgfd/deepfake_audio_classifier")

	# Predict from URL
	result = detector.predict("https://your-audio-file.wav", is_url=True)
	print(f"Prediction: {result['label']} ({result['confidence']:.1%})")
	```

	#### Batch Prediction
	```python
	from predict_from_hf import AudioDeepfakeDetectorFromHF

	detector = AudioDeepfakeDetectorFromHF("hjsgfd/deepfake_audio_classifier")

	# Multiple URLs
	audio_urls = [
	"https://example.com/audio1.wav",
	"https://example.com/audio2.wav",
	"https://example.com/audio3.wav",
	]

	results = detector.predict_batch(audio_urls, are_urls=True)

	# Print results
	for result in results:
	if 'prediction' in result:
	print(f"{result['audio_source']}: {result['label']} ({result['confidence']:.1%})")
	```

	#### Local Files
	```python
	# Single file
	result = detector.predict("path/to/audio.wav", is_url=False)

	# Multiple files
	local_files = ["audio1.wav", "audio2.wav", "audio3.wav"]
	results = detector.predict_batch(local_files, are_urls=False)
	```

	## 📁 Model Files

	The model consists of three files hosted on Hugging Face:

	1. deepfake_audio_classifier.pkl - Trained Logistic Regression classifier
	2. audio_scaler.pkl - StandardScaler for feature normalization
	3. model_metadata.json - Model configuration and metadata
	```json
	{
	"model_type": "LogisticRegression",
	"accuracy": 0.9286,
	"feature_extractor": "facebook/wav2vec2-base-960h",
	"embedding_dim": 768,
	"num_classes": 5,
	"class_labels": {
	"0": "class_0",
	"1": "class_1",
	"2": "class_2",
	"3": "class_3",
	"4": "class_4"
	}
	}
	```

	## 📈 Detailed Results

	### Training Configuration
	- Training Samples: 56 (80%)
	- Testing Samples: 14 (20%)
	- Feature Dimension: 768
	- Stratified Split: Maintains class distribution

	### Logistic Regression Performance (Best Model)
	```
	precision recall f1-score support

	class_0 1.00 0.67 0.80 3
	class_1 1.00 1.00 1.00 2
	class_2 1.00 1.00 1.00 3
	class_3 0.75 1.00 0.86 3
	class_4 1.00 1.00 1.00 3

	accuracy 0.93 14
	macro avg 0.95 0.93 0.93 14
	weighted avg 0.95 0.93 0.93 14
	```

	### Key Metrics
	- Macro Average Precision: 0.95
	- Macro Average Recall: 0.93
	- Macro Average F1-Score: 0.93
	- Overall Accuracy: 92.86%

	## 🔧 Technical Details

	### Dependencies
	```
	transformers>=4.30.0
	torch>=2.0.0
	librosa>=0.10.0
	soundfile>=0.12.0
	scikit-learn>=1.3.0
	huggingface-hub>=0.16.0
	requests>=2.31.0
	numpy>=1.24.0
	```

	### Model Architecture
	```
	Input: Audio File (any format supported by soundfile)
	↓
	Preprocessing (16kHz, Mono)
	↓
	Wav2Vec2 Feature Extractor
	↓
	768-dimensional Embedding
	↓
	StandardScaler Normalization
	↓
	Logistic Regression Classifier
	↓
	Output: Class Prediction + Confidence Scores
	```

	### Supported Audio Formats
	- WAV
	- MP3
	- FLAC
	- OGG
	- M4A

	## 📊 Training Process

	1. Data Loading: Load dataset with disabled auto-decoding
	2. Feature Extraction: Extract Wav2Vec2 embeddings (768-dim vectors)
	3. Train-Test Split: 80-20 stratified split
	4. Normalization: StandardScaler on training data
	5. Model Training: Train 3 classifiers (LR, RF, SVM)
	6. Evaluation: Compare performance on test set
	7. Selection: Choose best model (Logistic Regression)
	8. Export: Save model, scaler, and metadata

	## 🎯 Use Cases

	- Deepfake audio detection
	- Voice authentication systems
	- Media verification tools
	- Forensic audio analysis
	- Content moderation platforms

	## 🤝 Contributing

	Contributions are welcome! Please feel free to submit a Pull Request.

	## 📝 Citation

	If you use this model, please cite:
	```bibtex
	@misc{deepfake_audio_classifier_2024,
	author = {Your Name},
	title = {Deepfake Audio Detection Model},
	year = {2024},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/hjsgfd/deepfake_audio_classifier}}
	}
	```

	## 🙏 Acknowledgments

	- Dataset: [ud-nlp/real-vs-fake-human-voice-deepfake-audio](https://huggingface.co/datasets/ud-nlp/real-vs-fake-human-voice-deepfake-audio)
	- Feature Extractor: [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h)
	- Transformers Library: Hugging Face

	## 📧 Contact

	For questions or feedback, please open an issue on the repository.

	---