hjsgfd commited on
Commit
66780fc
Β·
verified Β·
1 Parent(s): f6bb345

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +251 -1
README.md CHANGED
@@ -1,5 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  from predict_from_hf import AudioDeepfakeDetectorFromHF
2
 
 
3
  detector = AudioDeepfakeDetectorFromHF("hjsgfd/deepfake_audio_classifier")
 
 
4
  result = detector.predict("https://your-audio-file.wav", is_url=True)
5
- print(f"Prediction: {result['label']} ({result['confidence']:.1%})")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🎡 Deepfake Audio Detection Model
2
+
3
+ A machine learning model to detect deepfake/synthetic audio using Wav2Vec2 embeddings and classical ML classifiers.
4
+
5
+ [![Hugging Face](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Model-yellow)](https://huggingface.co/hjsgfd/deepfake_audio_classifier)
6
+ [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
7
+ [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
8
+
9
+ ## πŸ“Š Model Performance
10
+
11
+ | Model | Accuracy | Precision | Recall | F1-Score |
12
+ |-------|----------|-----------|--------|----------|
13
+ | **Logistic Regression** | **92.86%** | 0.95 | 0.93 | 0.93 |
14
+ | SVM | 85.71% | 0.89 | 0.86 | 0.85 |
15
+ | Random Forest | 78.57% | 0.85 | 0.79 | 0.76 |
16
+
17
+ **Best Model: Logistic Regression with 92.86% accuracy**
18
+
19
+ ## 🎯 Approach
20
+
21
+ ### 1. Dataset
22
+ - **Source**: [Real vs Fake Human Voice Deepfake Audio Dataset](https://huggingface.co/datasets/ud-nlp/real-vs-fake-human-voice-deepfake-audio)
23
+ - **Size**: 70 audio samples
24
+ - **Classes**: 5 classes (0, 1, 2, 3, 4)
25
+ - **Distribution**: Perfectly balanced (14 samples per class)
26
+
27
+ ### 2. Feature Extraction
28
+ We use **Wav2Vec2** (facebook/wav2vec2-base-960h) to extract deep audio embeddings:
29
+ - Pre-trained self-supervised model
30
+ - Extracts 768-dimensional feature vectors
31
+ - Captures semantic audio information
32
+ - Handles variable-length audio automatically
33
+
34
+ **Pipeline:**
35
+ ```
36
+ Audio File β†’ Wav2Vec2 β†’ 768-dim Embedding β†’ Classifier β†’ Prediction
37
+ ```
38
+
39
+ ### 3. Model Training
40
+ Three classifiers were trained and compared:
41
+
42
+ #### Logistic Regression (Best)
43
+ - **Accuracy**: 92.86%
44
+ - Multi-class classification with OvR strategy
45
+ - Max iterations: 1000
46
+ - Features: StandardScaler normalized
47
+
48
+ #### SVM
49
+ - **Accuracy**: 85.71%
50
+ - RBF kernel
51
+ - Probability estimates enabled
52
+
53
+ #### Random Forest
54
+ - **Accuracy**: 78.57%
55
+ - 200 estimators
56
+ - Parallel processing enabled
57
+
58
+ ### 4. Preprocessing
59
+ - **Audio Loading**: Support for both URLs and local files
60
+ - **Resampling**: All audio converted to 16kHz
61
+ - **Stereo to Mono**: Averaged across channels
62
+ - **Normalization**: StandardScaler on embeddings
63
+
64
+ ## πŸš€ Quick Start
65
+
66
+ ### Installation
67
+ ```bash
68
+ pip install transformers torch librosa soundfile scikit-learn huggingface-hub requests numpy
69
+ ```
70
+
71
+ ### Usage
72
+
73
+ #### Simple Prediction
74
+ ```python
75
  from predict_from_hf import AudioDeepfakeDetectorFromHF
76
 
77
+ # Initialize detector (downloads model automatically)
78
  detector = AudioDeepfakeDetectorFromHF("hjsgfd/deepfake_audio_classifier")
79
+
80
+ # Predict from URL
81
  result = detector.predict("https://your-audio-file.wav", is_url=True)
82
+ print(f"Prediction: {result['label']} ({result['confidence']:.1%})")
83
+ ```
84
+
85
+ #### Batch Prediction
86
+ ```python
87
+ from predict_from_hf import AudioDeepfakeDetectorFromHF
88
+
89
+ detector = AudioDeepfakeDetectorFromHF("hjsgfd/deepfake_audio_classifier")
90
+
91
+ # Multiple URLs
92
+ audio_urls = [
93
+ "https://example.com/audio1.wav",
94
+ "https://example.com/audio2.wav",
95
+ "https://example.com/audio3.wav",
96
+ ]
97
+
98
+ results = detector.predict_batch(audio_urls, are_urls=True)
99
+
100
+ # Print results
101
+ for result in results:
102
+ if 'prediction' in result:
103
+ print(f"{result['audio_source']}: {result['label']} ({result['confidence']:.1%})")
104
+ ```
105
+
106
+ #### Local Files
107
+ ```python
108
+ # Single file
109
+ result = detector.predict("path/to/audio.wav", is_url=False)
110
+
111
+ # Multiple files
112
+ local_files = ["audio1.wav", "audio2.wav", "audio3.wav"]
113
+ results = detector.predict_batch(local_files, are_urls=False)
114
+ ```
115
+
116
+ ## πŸ“ Model Files
117
+
118
+ The model consists of three files hosted on Hugging Face:
119
+
120
+ 1. **deepfake_audio_classifier.pkl** - Trained Logistic Regression classifier
121
+ 2. **audio_scaler.pkl** - StandardScaler for feature normalization
122
+ 3. **model_metadata.json** - Model configuration and metadata
123
+ ```json
124
+ {
125
+ "model_type": "LogisticRegression",
126
+ "accuracy": 0.9286,
127
+ "feature_extractor": "facebook/wav2vec2-base-960h",
128
+ "embedding_dim": 768,
129
+ "num_classes": 5,
130
+ "class_labels": {
131
+ "0": "class_0",
132
+ "1": "class_1",
133
+ "2": "class_2",
134
+ "3": "class_3",
135
+ "4": "class_4"
136
+ }
137
+ }
138
+ ```
139
+
140
+ ## πŸ“ˆ Detailed Results
141
+
142
+ ### Training Configuration
143
+ - **Training Samples**: 56 (80%)
144
+ - **Testing Samples**: 14 (20%)
145
+ - **Feature Dimension**: 768
146
+ - **Stratified Split**: Maintains class distribution
147
+
148
+ ### Logistic Regression Performance (Best Model)
149
+ ```
150
+ precision recall f1-score support
151
+
152
+ class_0 1.00 0.67 0.80 3
153
+ class_1 1.00 1.00 1.00 2
154
+ class_2 1.00 1.00 1.00 3
155
+ class_3 0.75 1.00 0.86 3
156
+ class_4 1.00 1.00 1.00 3
157
+
158
+ accuracy 0.93 14
159
+ macro avg 0.95 0.93 0.93 14
160
+ weighted avg 0.95 0.93 0.93 14
161
+ ```
162
+
163
+ ### Key Metrics
164
+ - **Macro Average Precision**: 0.95
165
+ - **Macro Average Recall**: 0.93
166
+ - **Macro Average F1-Score**: 0.93
167
+ - **Overall Accuracy**: 92.86%
168
+
169
+ ## πŸ”§ Technical Details
170
+
171
+ ### Dependencies
172
+ ```
173
+ transformers>=4.30.0
174
+ torch>=2.0.0
175
+ librosa>=0.10.0
176
+ soundfile>=0.12.0
177
+ scikit-learn>=1.3.0
178
+ huggingface-hub>=0.16.0
179
+ requests>=2.31.0
180
+ numpy>=1.24.0
181
+ ```
182
+
183
+ ### Model Architecture
184
+ ```
185
+ Input: Audio File (any format supported by soundfile)
186
+ ↓
187
+ Preprocessing (16kHz, Mono)
188
+ ↓
189
+ Wav2Vec2 Feature Extractor
190
+ ↓
191
+ 768-dimensional Embedding
192
+ ↓
193
+ StandardScaler Normalization
194
+ ↓
195
+ Logistic Regression Classifier
196
+ ↓
197
+ Output: Class Prediction + Confidence Scores
198
+ ```
199
+
200
+ ### Supported Audio Formats
201
+ - WAV
202
+ - MP3
203
+ - FLAC
204
+ - OGG
205
+ - M4A
206
+
207
+ ## πŸ“Š Training Process
208
+
209
+ 1. **Data Loading**: Load dataset with disabled auto-decoding
210
+ 2. **Feature Extraction**: Extract Wav2Vec2 embeddings (768-dim vectors)
211
+ 3. **Train-Test Split**: 80-20 stratified split
212
+ 4. **Normalization**: StandardScaler on training data
213
+ 5. **Model Training**: Train 3 classifiers (LR, RF, SVM)
214
+ 6. **Evaluation**: Compare performance on test set
215
+ 7. **Selection**: Choose best model (Logistic Regression)
216
+ 8. **Export**: Save model, scaler, and metadata
217
+
218
+ ## 🎯 Use Cases
219
+
220
+ - Deepfake audio detection
221
+ - Voice authentication systems
222
+ - Media verification tools
223
+ - Forensic audio analysis
224
+ - Content moderation platforms
225
+
226
+ ## 🀝 Contributing
227
+
228
+ Contributions are welcome! Please feel free to submit a Pull Request.
229
+
230
+ ## πŸ“ Citation
231
+
232
+ If you use this model, please cite:
233
+ ```bibtex
234
+ @misc{deepfake_audio_classifier_2024,
235
+ author = {Your Name},
236
+ title = {Deepfake Audio Detection Model},
237
+ year = {2024},
238
+ publisher = {Hugging Face},
239
+ howpublished = {\url{https://huggingface.co/hjsgfd/deepfake_audio_classifier}}
240
+ }
241
+ ```
242
+
243
+ ## πŸ™ Acknowledgments
244
+
245
+ - **Dataset**: [ud-nlp/real-vs-fake-human-voice-deepfake-audio](https://huggingface.co/datasets/ud-nlp/real-vs-fake-human-voice-deepfake-audio)
246
+ - **Feature Extractor**: [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h)
247
+ - **Transformers Library**: Hugging Face
248
+
249
+ ## πŸ“§ Contact
250
+
251
+ For questions or feedback, please open an issue on the repository.
252
+
253
+ ---
254
+
255
+ **⚠️ Disclaimer**: This model is for research and educational purposes. Always verify critical audio authenticity through multiple methods.