Mrkomiljon commited on
Commit
3f2a057
ยท
verified ยท
1 Parent(s): 10d748f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +141 -3
README.md CHANGED
@@ -1,3 +1,141 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - LanceaKing/asvspoof2019
5
+ language:
6
+ - en
7
+ metrics:
8
+ - accuracy
9
+ ---
10
+ # DeepVoiceGuard: Real-Time Audio Authenticity Detection
11
+
12
+ **DeepVoiceGuard** is an advanced AI-powered tool for detecting whether an audio file is genuine or AI-generated. Built using RawNet-based architecture and trained on ASVspoof datasets, this model is optimized for real-time inference using ONNX format.
13
+
14
+ ---
15
+
16
+ ## ๐Ÿš€ Features
17
+ - **Real-Time Detection:** Analyze audio files quickly and efficiently to determine authenticity.
18
+ - **Sliding Window Processing:** Processes long audio files in segments for accurate classification.
19
+ - **ONNX Optimized:** Faster inference compared to traditional formats.
20
+ - **Interactive Demo:** Test the model using our Streamlit application.
21
+
22
+ ---
23
+
24
+ ## ๐Ÿ“š Model Overview
25
+ - **Architecture:** RawNet-based Neural Network
26
+ - **Frameworks Used:** PyTorch, ONNX
27
+ - **Dataset:** Trained on ASVspoof 2019 Challenge dataset(LA)
28
+ - **Classes:**
29
+ - **Real:** Genuine human speech
30
+ - **Fake:** AI-generated or spoofed audio
31
+
32
+ ---
33
+
34
+ ## ๐Ÿ›  Installation
35
+ Install the necessary dependencies:
36
+ ```bash
37
+ pip install onnxruntime librosa numpy requests streamlit
38
+ ```
39
+ ๐Ÿ”ง How to Use
40
+ Using the ONNX Model
41
+ ```
42
+ import streamlit as st
43
+ import librosa
44
+ import numpy as np
45
+ import onnxruntime as ort
46
+ import os
47
+ import requests
48
+
49
+ # Audio padding function
50
+ def pad(x, max_len=64600):
51
+ """
52
+ Pad or trim an audio segment to a fixed length by repeating or slicing.
53
+ """
54
+ x_len = x.shape[0]
55
+ if x_len >= max_len:
56
+ return x[:max_len] # Trim if longer
57
+ # Repeat to fill max_len
58
+ num_repeats = (max_len // x_len) + 1
59
+ padded_x = np.tile(x, (1, num_repeats))[:, :max_len][0]
60
+ return padded_x
61
+
62
+ # Preprocess audio for a single segment
63
+ def preprocess_audio_segment(segment, cut=64600):
64
+ """
65
+ Preprocess a single audio segment: pad or trim as required.
66
+ """
67
+ segment = pad(segment, max_len=cut)
68
+ return np.expand_dims(np.array(segment, dtype=np.float32), axis=0) # Add batch dimension
69
+
70
+ # Download ONNX model from Hugging Face
71
+ def download_model(url, local_path="RawNet_model.onnx"):
72
+ """
73
+ Download the ONNX model from a URL if it doesn't already exist locally.
74
+ """
75
+ if not os.path.exists(local_path):
76
+ with st.spinner("Downloading ONNX model..."):
77
+ response = requests.get(url)
78
+ if response.status_code == 200:
79
+ with open(local_path, "wb") as f:
80
+ f.write(response.content)
81
+ st.success("Model downloaded successfully!")
82
+ else:
83
+ raise Exception("Failed to download ONNX model")
84
+ return local_path
85
+
86
+ # Sliding window prediction function
87
+ def predict_with_sliding_window(audio_path, onnx_model_path, window_size=64600, step_size=64600, sample_rate=16000):
88
+ """
89
+ Use a sliding window to predict if the audio is real or fake over the entire audio.
90
+ """
91
+ # Load ONNX runtime session
92
+ ort_session = ort.InferenceSession(onnx_model_path)
93
+
94
+ # Load audio file
95
+ waveform, _ = librosa.load(audio_path, sr=sample_rate)
96
+ total_segments = []
97
+ total_probabilities = []
98
+
99
+ # Sliding window processing
100
+ for start in range(0, len(waveform), step_size):
101
+ end = start + window_size
102
+ segment = waveform[start:end]
103
+
104
+ # Preprocess the segment
105
+ audio_tensor = preprocess_audio_segment(segment)
106
+
107
+ # Perform inference
108
+ inputs = {ort_session.get_inputs()[0].name: audio_tensor}
109
+ outputs = ort_session.run(None, inputs)
110
+ probabilities = np.exp(outputs[0]) # Softmax probabilities
111
+ prediction = np.argmax(probabilities)
112
+
113
+ # Store the results
114
+ predicted_class = "Real" if prediction == 1 else "Fake"
115
+ total_segments.append(predicted_class)
116
+ total_probabilities.append(probabilities[0][prediction])
117
+
118
+ # Final aggregation
119
+ majority_class = max(set(total_segments), key=total_segments.count) # Majority voting
120
+ avg_probability = np.mean(total_probabilities) * 100 # Average probability in percentage
121
+
122
+ return majority_class, avg_probability
123
+
124
+ # Example
125
+ result = predict("example.wav")
126
+ print(f"Prediction: {result}")
127
+ ```
128
+
129
+ ๐Ÿ“Š Performance Metrics
130
+ Equal Error Rate (EER): 4.21%
131
+ Accuracy: 95.8%
132
+ ROC-AUC: 0.986
133
+
134
+ ๐Ÿ›ก License
135
+ This project is licensed under the MIT License.
136
+
137
+ โœ‰๏ธ Contact
138
+ For inquiries or support, please contact:
139
+
140
+ - GitHub: (Mrkomiljon)[https://github.com/Mrkomiljon/DeepVoiceGuard]
141
+ - Hugging Face: (DeepVoiceGuard)[https://huggingface.co/spaces/Mrkomiljon/DeepVoiceGuard]