satyaaa-m commited on
Commit
75d3095
·
0 Parent(s):

First commit

Browse files
Files changed (8) hide show
  1. .dockerignore +9 -0
  2. .gitattributes +35 -0
  3. .gitignore +5 -0
  4. Dockerfile +12 -0
  5. README.md +136 -0
  6. app.py +131 -0
  7. detect.py +163 -0
  8. requirements.txt +11 -0
.dockerignore ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__
2
+ venv
3
+ .git
4
+ .env
5
+ audio
6
+ *.pyc
7
+ *.pyo
8
+ *.pyd
9
+ .DS_Store
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ audio/
2
+ __pycache__/
3
+ .env
4
+ *.pyc
5
+ .DS_Store
Dockerfile ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ WORKDIR /app
4
+
5
+ COPY requirements.txt .
6
+ RUN pip install --no-cache-dir -r requirements.txt
7
+
8
+ COPY . .
9
+
10
+ EXPOSE 7860
11
+
12
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: AudioShield AI Voice Detector
3
+ emoji: 🛡️
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: docker
7
+ app_file: app.py
8
+ pinned: false
9
+ ---
10
+
11
+ # AudioShield AI: Voice Fraud Detection System
12
+
13
+ > **Problem Statement 01**: AI-Generated Voice Detection for Regional Languages
14
+
15
+ ![Architecture](Architechture.png)
16
+
17
+ ## 🚀 Overview
18
+ **AudioShield AI** is a high-performance REST API designed to detect AI-generated voice deepfakes with exceptional accuracy. Built for the **GUVI Hackathon**, it specifically addresses the challenge of identifying synthetic audio in **Tamil, English, Hindi, Malayalam, and Telugu**.
19
+
20
+ Unlike standard detectors, AudioShield uses a **Multi-Model Voting Ensemble** approach, aggregating the intelligence of 4 state-of-the-art Wav2Vec2 models to make a final, highly reliable decision.
21
+
22
+ ## 🎯 Problem It Solves
23
+ With the rise of Generative AI, voice scams and deepfakes are becoming indistinguishable from reality. Financial fraud, impersonation, and misinformation are growing threats. AudioShield provides a robust, scalable defense mechanism that can be integrated into calls, messaging apps, and verification systems.
24
+
25
+ ## ✨ Key Features
26
+ * **🛡️ Voting Ensemble Power**: Leverages 4 distinct AI models (MelodyMachine, Mo-Creator, Hemgg, Gustking-XLSR) to minimize false positives.
27
+ * **🌍 Multi-Lingual Support**: Optimized for Indian regional languages (Tamil, Telugu, Hindi, Malayalam) + English.
28
+ * **⚡ Zero Cold Start**: Implements a "Warm-up" routine to ensure the first API request is as fast as the 100th.
29
+ * **🚀 Render-Ready**: Configured for seamless deployment on cloud platforms like Render.
30
+ * **🔍 Explainable AI**: Provides detailed JSON responses with classification confidence and logic.
31
+
32
+ ## 🏗️ System Architecture
33
+ The system follows a **Microservices-ready, Layered Architecture**:
34
+
35
+ ```mermaid
36
+ graph TD
37
+ User[Client / Postman] -->|"HTTP POST (Base64)"| API[FastAPI Service]
38
+ API -->|"Async Thread"| Engine[Detection Engine]
39
+
40
+ subgraph "Ensemble Committee (The AI Core)"
41
+ Engine -->|Input| M1[MelodyMachine]
42
+ Engine -->|Input| M2[Mo-Creator]
43
+ Engine -->|Input| M3[Hemgg]
44
+ Engine -->|Input| M4["Gustking (XLSR)"]
45
+
46
+ M1 -->|Vote| Agg[Weighted Aggregator]
47
+ M2 -->|Vote| Agg
48
+ M3 -->|Vote| Agg
49
+ M4 -->|Vote| Agg
50
+ end
51
+
52
+ Agg -->|Final Score| Verdict[Classification Logic]
53
+ Verdict -->|JSON Response| User
54
+ ```
55
+
56
+ ### Core Components
57
+ 1. **FastAPI Layer (`app.py`)**: Handles HTTP requests, validation, and async processing.
58
+ 2. **Detection Engine (`detect.py`)**: Manages model loading, inference, and the ensemble voting logic.
59
+ 3. **Models**:
60
+ * `MelodyMachine/Deepfake-audio-detection-V2`
61
+ * `mo-thecreator/Deepfake-audio-detection`
62
+ * `Hemgg/Deepfake-audio-detection`
63
+ * `Gustking/wav2vec2-large-xlsr-deepfake-audio-classification` (The "Expert" model)
64
+
65
+ ## 🛠️ Tech Stack
66
+ * **Language**: Python 3.10+
67
+ * **API Framework**: FastAPI, Uvicorn
68
+ * **ML Libraries**: PyTorch, Transformers, Librosa, NumPy
69
+ * **Deployment**: Docker-ready, Render-compatible
70
+
71
+ ## 🚀 Installation & Usage
72
+
73
+ ### 1. Clone the Repository
74
+ ```bash
75
+ git clone https://github.com/krish1440/AI-Generated-Voice-Detection.git
76
+ cd AI-Generated-Voice-Detection
77
+ ```
78
+
79
+ ### 2. Install Dependencies
80
+ ```bash
81
+ pip install -r requirements.txt
82
+ ```
83
+
84
+ ### 3. Run the Server
85
+ ```bash
86
+ python app.py
87
+ ```
88
+ *The server will start on port `8000` (or the PORT env var).*
89
+ *Note: On the first run, it will download necessary model weights (approx. 2-3GB).*
90
+
91
+ ## 🔌 API Documentation
92
+
93
+ ### Detect Voice
94
+ **Endpoint**: `POST /api/voice-detection`
95
+
96
+ **Request Body** (JSON):
97
+ ```json
98
+ {
99
+ "language": "Tamil",
100
+ "audioFormat": "mp3",
101
+ "audioBase64": "<Base64 encoded MP3 string>"
102
+ }
103
+ ```
104
+
105
+ **Response** (Success):
106
+ ```json
107
+ {
108
+ "status": "success",
109
+ "language": "Tamil",
110
+ "classification": "AI_GENERATED",
111
+ "confidenceScore": 0.98,
112
+ "explanation": "Ensemble Analysis: 4/4 models flagged this audio as AI-generated."
113
+ }
114
+ ```
115
+
116
+ **Response** (Error):
117
+ ```json
118
+ {
119
+ "status": "error",
120
+ "message": "Invalid Base64 encoding."
121
+ }
122
+ ```
123
+
124
+ ## ☁️ Deployment (Hugging Face Spaces)
125
+ This project is Dockerized for Hugging Face Spaces.
126
+
127
+ 1. Create a new **Space** on Hugging Face using the **Docker** SDK.
128
+ 2. Connect your GitHub repository.
129
+ 3. Hugging Face will automatically build using the `Dockerfile`.
130
+ 4. The API will be live at `https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME/api/voice-detection`.
131
+
132
+ **Note**: The Dockerfile builds `ffmpeg` and runs as user `1000` for security compliance on Spaces.
133
+ > **Tip**: If the build fails with a registry error, try "Factory Reboot" in the Settings tab.
134
+
135
+ ---
136
+ *Developed for GUVI Hackathon.*
app.py ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import base64
2
+ import os
3
+ import contextlib
4
+ import logging
5
+ from typing import Optional
6
+
7
+ import uvicorn
8
+ from fastapi import FastAPI, Header, HTTPException, Request, BackgroundTasks
9
+ from fastapi.responses import JSONResponse
10
+ from fastapi.concurrency import run_in_threadpool
11
+ from pydantic import BaseModel, Field
12
+
13
+ # Setup Logging
14
+ logging.basicConfig(level=logging.INFO, format='%(asctime)s - [AudioShield] - %(levelname)s - %(message)s')
15
+ logger = logging.getLogger(__name__)
16
+
17
+ # Import our detection engine
18
+ from detect import detector
19
+
20
+ # load_dotenv() - REMOVED per user request for Render deployment
21
+
22
+ # LIFESPAN MANAGER (Resolves Cold Start)
23
+ @contextlib.asynccontextmanager
24
+ async def lifespan(app: FastAPI):
25
+ # Startup Logic
26
+ logger.info("--- Warming up the AI engine... ---")
27
+ try:
28
+ # Trigger model loading in a threadpool so it doesn't block startup completely if async
29
+ # But here we want to block until ready.
30
+ # Run a dummy inference to ensure weights are on device
31
+ dummy_audio = b'\x00' * 16000 # 1 sec silent
32
+ await run_in_threadpool(detector.analyze_audio, dummy_audio, "English")
33
+ logger.info("--- AI Engine Ready & Warmed Up! ---")
34
+ except Exception as e:
35
+ logger.error(f"Warmup failed: {e}")
36
+
37
+ yield
38
+
39
+ # Shutdown Logic
40
+ logger.info("--- Shutting down AudioShield ---")
41
+
42
+ app = FastAPI(
43
+ title="AudioShield AI: Voice Fraud Detector",
44
+ version="2.0",
45
+ docs_url="/docs",
46
+ lifespan=lifespan
47
+ )
48
+
49
+ # CONFIGURATION
50
+ # Default key from problem statement example: sk_test_123456789
51
+ VALID_API_KEY = os.getenv("API_KEY", "sk_test_123456789")
52
+
53
+ # MODELS (Strict Adherence to Spec)
54
+ class VoiceDetectionRequest(BaseModel):
55
+ language: str = Field(..., description="Language: Tamil, English, Hindi, Malayalam, Telugu")
56
+ audioFormat: str = Field(..., pattern="^(?i)mp3$", description="Must be 'mp3'")
57
+ audioBase64: str = Field(..., description="Base64 encoded MP3 audio")
58
+
59
+ class VoiceDetectionResponse(BaseModel):
60
+ status: str
61
+ language: str
62
+ classification: str # AI_GENERATED or HUMAN
63
+ confidenceScore: float
64
+ explanation: str
65
+
66
+ # ROUTES
67
+ @app.post("/api/voice-detection", response_model=VoiceDetectionResponse)
68
+ async def detect_voice(
69
+ request: VoiceDetectionRequest
70
+ ):
71
+ # 1. Security Check - REMOVED for Public Access per user request
72
+ # logger.info(f"Public Access: Processing request for {request.language}")
73
+
74
+ try:
75
+ # 2. Basic Validation (Logic)
76
+ if request.audioFormat.lower() != "mp3":
77
+ # Just to be perfectly safe, though Pydantic regex handles it
78
+ raise ValueError("Only MP3 format is supported.")
79
+
80
+ # 3. Decode Base64
81
+ try:
82
+ audio_data = base64.b64decode(request.audioBase64)
83
+ except Exception:
84
+ raise ValueError("Invalid Base64 encoding.")
85
+
86
+ if not audio_data:
87
+ raise ValueError("Empty audio data.")
88
+
89
+ # 4. Perform Detection (Non-Blocking)
90
+ # We run the synchronous detector.analyze_audio in a threadpool
91
+ # so the API remains responsive to other requests.
92
+ logger.info(f"Processing request for language: {request.language}")
93
+
94
+ result = await run_in_threadpool(detector.analyze_audio, audio_data, request.language)
95
+
96
+ if "error" in result:
97
+ # If internal analysis failed, we still want to return a strict error format if possible,
98
+ # or map it to the error response.
99
+ raise ValueError(result["error"])
100
+
101
+ # 5. Return formatted response (Strict JSON)
102
+ return VoiceDetectionResponse(
103
+ status="success",
104
+ language=request.language,
105
+ classification=result["classification"],
106
+ confidenceScore=result["confidenceScore"],
107
+ explanation=result["explanation"]
108
+ )
109
+
110
+ except ValueError as ve:
111
+ logger.error(f"Validation Error: {ve}")
112
+ return JSONResponse(
113
+ status_code=400,
114
+ content={"status": "error", "message": str(ve)}
115
+ )
116
+ except Exception as e:
117
+ logger.error(f"Internal Error: {e}")
118
+ return JSONResponse(
119
+ status_code=500,
120
+ content={"status": "error", "message": "Internal server error processing audio."}
121
+ )
122
+
123
+ @app.get("/")
124
+ def health_check():
125
+ return {
126
+ "status": "online",
127
+ "service": "AudioShield AI (Hackathon Edition)",
128
+ "models_loaded": len(detector.pipelines) if hasattr(detector, 'pipelines') else 0
129
+ }
130
+
131
+ # Standard execution for HF Spaces (uvicorn launched via Docker CMD)
detect.py ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import io
2
+ import librosa
3
+ import numpy as np
4
+ import soundfile as sf
5
+ import torch
6
+ from transformers import pipeline
7
+
8
+ class AudioDetector:
9
+ def __init__(self):
10
+ print("--- [AudioDetector] Initializing 4-Model Ensemble System... ---")
11
+
12
+ # The Committee of Experts
13
+ self.models_config = [
14
+ {
15
+ "id": "MelodyMachine/Deepfake-audio-detection-V2",
16
+ "name": "MelodyMachine",
17
+ "weight": 1.0
18
+ },
19
+ {
20
+ "id": "mo-thecreator/Deepfake-audio-detection",
21
+ "name": "Mo-Creator",
22
+ "weight": 1.0
23
+ },
24
+ {
25
+ "id": "Hemgg/Deepfake-audio-detection",
26
+ "name": "Hemgg",
27
+ "weight": 1.0
28
+ },
29
+ {
30
+ "id": "Gustking/wav2vec2-large-xlsr-deepfake-audio-classification",
31
+ "name": "Gustking-XLSR",
32
+ "weight": 1.2 # Higher weight for the large model
33
+ }
34
+ ]
35
+
36
+ self.pipelines = []
37
+
38
+ for cfg in self.models_config:
39
+ try:
40
+ print(f"--- Loading Model: {cfg['name']} ({cfg['id']}) ---")
41
+ # Load pipeline
42
+ p = pipeline("audio-classification", model=cfg['id'])
43
+ self.pipelines.append({"pipe": p, "config": cfg})
44
+ print(f"[+] Loaded {cfg['name']}")
45
+ except Exception as e:
46
+ print(f"[-] Failed to load {cfg['name']}: {e}")
47
+
48
+ if not self.pipelines:
49
+ print("CRITICAL: No models could be loaded. Ensemble is empty.")
50
+
51
+ def analyze_audio(self, audio_data: bytes, language: str):
52
+ try:
53
+ # 1. Load Audio
54
+ buffer = io.BytesIO(audio_data)
55
+ y, sr = librosa.load(buffer, sr=16000)
56
+
57
+ # 2. Extract Features (For Explanation Context Only)
58
+ # We preserve this for generating professional justifications,
59
+ # but the DECISION is purely model-based.
60
+ centroid = np.mean(librosa.feature.spectral_centroid(y=y, sr=sr))
61
+
62
+ # 3. Running The Ensemble
63
+ votes = []
64
+ total_score = 0
65
+ total_weight = 0
66
+
67
+ print(f"\n--- Running Ensemble Inference on {len(self.pipelines)} models ---")
68
+
69
+ for item in self.pipelines:
70
+ p = item['pipe']
71
+ cfg = item['config']
72
+ weight = cfg['weight']
73
+
74
+ try:
75
+ # Run Inference
76
+ results = p(y, top_k=None) # Get all labels
77
+
78
+ # Parsing Result for AI Probability
79
+ ai_score = 0.0
80
+
81
+ # Logic: Find the label that means "Fake"
82
+ ai_labels = ["fake", "spoof", "aivoice", "artificial", "generated"]
83
+
84
+ found = False
85
+ for r in results:
86
+ label_clean = r['label'].lower().strip()
87
+ if label_clean in ai_labels:
88
+ ai_score = r['score']
89
+ found = True
90
+ break
91
+
92
+ # Note: If no AI label is found (e.g. only 'real'/'human'), ai_score stays 0.0 (Human)
93
+ # This logic covers {0: 'real', 1: 'fake'} where 'fake' is present.
94
+
95
+ verdict = "AI" if ai_score > 0.5 else "HUMAN"
96
+
97
+ # Weighted contribution
98
+ votes.append({
99
+ "name": cfg['name'],
100
+ "ai_prob": ai_score,
101
+ "verdict": verdict
102
+ })
103
+
104
+ total_score += (ai_score * weight)
105
+ total_weight += weight
106
+
107
+ print(f" > {cfg['name']}: {ai_score:.4f} ({verdict})")
108
+
109
+ except Exception as e:
110
+ print(f"Error inferencing {cfg['name']}: {e}")
111
+
112
+ # 4. Final Aggregation
113
+ if total_weight > 0:
114
+ final_ensemble_score = total_score / total_weight
115
+ else:
116
+ final_ensemble_score = 0.0 # Fail safe
117
+
118
+ is_ai = final_ensemble_score > 0.5
119
+ final_classification = "AI_GENERATED" if is_ai else "HUMAN"
120
+
121
+ # Confidence Score: Distance from 0.5, normalized to 0.5-1.0 roughly,
122
+ # or just probability of the winning class.
123
+ class_confidence = final_ensemble_score if is_ai else (1.0 - final_ensemble_score)
124
+
125
+ print(f"--- Final Ensemble Score: {final_ensemble_score:.4f} => {final_classification} (Conf: {class_confidence:.2f}) ---\n")
126
+
127
+ # 5. Construct Explanation
128
+ # "3 out of 4 models detected deepfake artifacts..."
129
+ ai_votes_count = sum(1 for v in votes if v['verdict'] == 'AI')
130
+ total_models = len(votes)
131
+
132
+ explanations = []
133
+ explanations.append(f"Ensemble Analysis: {ai_votes_count}/{total_models} models flagged this audio as AI-generated.")
134
+ explanations.append(f"Aggregated Score: {final_ensemble_score*100:.1f}%.")
135
+
136
+ if is_ai:
137
+ if centroid > 2000:
138
+ explanations.append("High-frequency spectral artifacts consistent with neural vocoders detected.")
139
+ else:
140
+ explanations.append("Deep learning pattern matching identified non-biological features.")
141
+ else:
142
+ explanations.append("Acoustic analysis confirms natural vocal resonance and organic production.")
143
+
144
+ final_explanation = " ".join(explanations)
145
+
146
+ return {
147
+ "classification": final_classification,
148
+ # Return logical confidence (prob of the chosen class)
149
+ "confidenceScore": round(float(class_confidence), 2),
150
+ "explanation": final_explanation
151
+ }
152
+
153
+ except Exception as e:
154
+ print(f"Analysis Failed: {e}")
155
+ return {
156
+ "classification": "HUMAN", # Fail safe
157
+ "confidenceScore": 0.0,
158
+ "error": str(e),
159
+ "explanation": "Analysis failed due to internal error."
160
+ }
161
+
162
+ # Global Instance
163
+ detector = AudioDetector()
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ fastapi
2
+ uvicorn
3
+ python-multipart
4
+ pydantic
5
+ librosa
6
+ numpy
7
+ torch
8
+ transformers
9
+ soundfile
10
+ accelerate
11
+