shivam0897-i commited on
Commit
8a6ab53
·
1 Parent(s): 395a38b

perf: optimize deps, imports, Dockerfile; improve README for judges

Browse files
Files changed (10) hide show
  1. .dockerignore +46 -0
  2. Dockerfile +2 -7
  3. README.md +134 -49
  4. audio_utils.py +2 -31
  5. evaluation_results.json +50 -0
  6. main.py +4 -7
  7. model.py +3 -6
  8. requirements.txt +2 -8
  9. run_final_tests.py +0 -44
  10. test_my_api.py +171 -0
.dockerignore ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Tests and test data
2
+ tests/
3
+ _test_*.py
4
+ test_*.py
5
+ test_*.json
6
+ pytest.ini
7
+ drive-download-*/
8
+ run_final_tests.py
9
+ test_my_api.py
10
+
11
+ # Documentation (not needed at runtime)
12
+ docs/
13
+ *.md
14
+ !README.md
15
+
16
+ # Training artifacts
17
+ training/
18
+ *.ipynb
19
+
20
+ # Python caches
21
+ __pycache__/
22
+ *.pyc
23
+ *.pyo
24
+
25
+ # IDE and OS files
26
+ .vscode/
27
+ .idea/
28
+ *.swp
29
+ .DS_Store
30
+ Thumbs.db
31
+
32
+ # Scripts
33
+ scripts/
34
+
35
+ # Analysis results
36
+ *.json
37
+ !test_request.json
38
+ !test_valid.json
39
+
40
+ # Git
41
+ .git/
42
+ .gitignore
43
+
44
+ # Env files
45
+ .env
46
+ .env.*
Dockerfile CHANGED
@@ -6,13 +6,8 @@ WORKDIR /app
6
  RUN apt-get update && apt-get install -y \
7
  libsndfile1 \
8
  ffmpeg \
9
- git \
10
- git-lfs \
11
  && rm -rf /var/lib/apt/lists/*
12
 
13
- # Initialize git lfs
14
- RUN git lfs install
15
-
16
  # Copy requirements first for better caching
17
  COPY requirements.txt .
18
 
@@ -36,5 +31,5 @@ WORKDIR /app
36
  # Hugging Face Spaces uses port 7860
37
  EXPOSE 7860
38
 
39
- # Run the application
40
- CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860"]
 
6
  RUN apt-get update && apt-get install -y \
7
  libsndfile1 \
8
  ffmpeg \
 
 
9
  && rm -rf /var/lib/apt/lists/*
10
 
 
 
 
11
  # Copy requirements first for better caching
12
  COPY requirements.txt .
13
 
 
31
  # Hugging Face Spaces uses port 7860
32
  EXPOSE 7860
33
 
34
+ # Run the application (2 workers for concurrent request handling)
35
+ CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "2"]
README.md CHANGED
@@ -11,16 +11,93 @@ app_port: 7860
11
 
12
  # AI Voice Detection API
13
 
14
- Detects whether a voice sample is AI-generated or spoken by a real human using a fine-tuned Wav2Vec2 model.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ## API Endpoint
17
 
18
- `POST /api/voice-detection`
19
 
20
- ### Headers
21
- - `x-api-key`: Your API key (set via environment variable `API_KEY`)
 
 
 
22
 
23
- ### Request Body
24
  ```json
25
  {
26
  "language": "English",
@@ -29,65 +106,73 @@ Detects whether a voice sample is AI-generated or spoken by a real human using a
29
  }
30
  ```
31
 
32
- ### Response
33
  ```json
34
  {
35
  "status": "success",
36
  "language": "English",
37
- "classification": "AI_GENERATED" | "HUMAN",
38
- "confidenceScore": 0.95,
39
- "explanation": "AI voice indicators: ..."
40
  }
41
  ```
42
 
43
- ## Supported Languages
44
- - English
45
- - Tamil
46
- - Hindi
47
- - Malayalam
48
- - Telugu
49
-
50
-
51
-
52
- ## Realtime Session APIs
53
-
54
- The backend also supports session-based realtime analysis:
55
-
56
- - `POST /v1/session/start`
57
- - `POST /v1/session/{session_id}/chunk`
58
- - `GET /v1/session/{session_id}/summary`
59
- - `GET /v1/session/{session_id}/alerts`
60
- - `POST /v1/session/{session_id}/end`
61
-
62
- Compatibility aliases are available under `/api/voice-detection/v1/...`.
63
-
64
- ## Optional LLM Semantic Verifier
65
-
66
- A second-layer semantic verifier can be enabled to improve ambiguous chunk scoring:
67
 
68
- - `LLM_SEMANTIC_ENABLED=true`
69
- - `LLM_PROVIDER=openai` with `OPENAI_API_KEY=<your_key>`, or
70
- - `LLM_PROVIDER=gemini` with `GEMINI_API_KEY=<your_key>`
71
- - Tune with `LLM_SEMANTIC_*` env variables in `.env.example`.
72
 
73
- If `LLM_SEMANTIC_MODEL` is empty, provider defaults are used (`gpt-4o-mini` for OpenAI, `gemini-1.5-flash` for Gemini).
 
 
 
 
 
 
 
74
 
75
- The LLM layer is optional and the API continues to work when disabled.
76
 
 
 
 
 
 
 
 
 
 
77
 
78
- ## Session Store Backend
79
 
80
- Realtime sessions support two backends:
81
 
82
- - `memory` (default): single-instance, volatile
83
- - `redis`: multi-worker and restart-safe (recommended for finals)
 
84
 
85
- Backend env settings:
86
 
87
- - `SESSION_STORE_BACKEND=redis`
88
- - `REDIS_URL=redis://...` (or `rediss://...`)
89
- - `REDIS_PREFIX=ai_call_shield`
 
 
 
 
 
 
 
 
 
90
 
91
- `GET /health` now includes `session_store_backend` so you can verify active backend.
92
 
93
- See `docs/architecture/redis-credentials-guide.md` for credential formats and setup steps.
 
11
 
12
  # AI Voice Detection API
13
 
14
+ Detects whether a voice sample is **AI-generated** or spoken by a **real human** using a fine-tuned Wav2Vec2 transformer model combined with multi-signal forensic analysis.
15
+
16
+ ## Model Architecture
17
+
18
+ ```
19
+ Audio Input (Base64 MP3/WAV)
20
+
21
+
22
+ ┌─────────────────────┐
23
+ │ Audio Preprocessing │ librosa 16 kHz mono, normalization
24
+ └────────┬────────────┘
25
+
26
+ ┌────┴────┐
27
+ ▼ ▼
28
+ ┌────────┐ ┌──────────────────┐
29
+ │Wav2Vec2│ │ Signal Forensics │
30
+ │ Model │ │ (4 dimensions) │
31
+ └───┬────┘ └───────┬──────────┘
32
+ │ │
33
+ ▼ ▼
34
+ Softmax ┌─────────────┐
35
+ Confidence │ Pitch │
36
+ │ │ Spectral │
37
+ │ │ Temporal │
38
+ │ │ Authenticity│
39
+ │ └──────┬──────┘
40
+ └───────┬───────┘
41
+
42
+ Final Classification
43
+ (HUMAN / AI_GENERATED)
44
+ ```
45
+
46
+ ### Key Components
47
+
48
+ | Component | Description |
49
+ |-----------|-------------|
50
+ | **ML Backbone** | [Wav2Vec2ForSequenceClassification](https://huggingface.co/shivam-2211/voice-detection-model) fine-tuned on human vs. AI-generated speech |
51
+ | **Temperature Scaling** | Logits scaled by T=1.5 before softmax for well-calibrated confidence scores |
52
+ | **Signal Forensics** | Pitch stability, spectral entropy, temporal rhythm, and acoustic anomaly detection |
53
+ | **ASR Integration** | Faster-Whisper (tiny, int8) for language detection and transcript extraction |
54
+ | **Timeout Safety** | 20-second budget with audio truncation to guarantee <30s response |
55
+
56
+ ## Quick Start
57
+
58
+ ### Prerequisites
59
+
60
+ - Python 3.10+
61
+ - FFmpeg (`apt-get install ffmpeg` or `brew install ffmpeg`)
62
+
63
+ ### Local Setup
64
+
65
+ ```bash
66
+ # Clone the repository
67
+ git clone https://github.com/shivam0897-i/voice_backend.git
68
+ cd voice_backend
69
+
70
+ # Install CPU-only PyTorch
71
+ pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
72
+
73
+ # Install dependencies
74
+ pip install -r requirements.txt
75
+
76
+ # Set your API key
77
+ echo "API_KEY=your_secret_key" > .env
78
+
79
+ # Run the server
80
+ uvicorn main:app --host 0.0.0.0 --port 7860
81
+ ```
82
+
83
+ ### Docker
84
+
85
+ ```bash
86
+ docker build -t voice-detection-api .
87
+ docker run -p 7860:7860 -e API_KEY=your_secret_key voice-detection-api
88
+ ```
89
 
90
  ## API Endpoint
91
 
92
+ ### `POST /api/voice-detection`
93
 
94
+ **Headers:**
95
+ | Header | Description |
96
+ |--------|-------------|
97
+ | `Content-Type` | `application/json` |
98
+ | `x-api-key` | Your API key (set via `API_KEY` env var) |
99
 
100
+ **Request Body:**
101
  ```json
102
  {
103
  "language": "English",
 
106
  }
107
  ```
108
 
109
+ **Response (200 OK):**
110
  ```json
111
  {
112
  "status": "success",
113
  "language": "English",
114
+ "classification": "AI_GENERATED",
115
+ "confidenceScore": 0.99,
116
+ "explanation": "AI voice indicators detected with high confidence..."
117
  }
118
  ```
119
 
120
+ **Example with curl:**
121
+ ```bash
122
+ # Encode audio to Base64 and send
123
+ AUDIO_B64=$(base64 -w0 sample.mp3)
124
+ curl -X POST https://shivam-2211-voice-detection-api.hf.space/api/voice-detection \
125
+ -H "Content-Type: application/json" \
126
+ -H "x-api-key: YOUR_KEY" \
127
+ -d "{\"language\": \"English\", \"audioFormat\": \"mp3\", \"audioBase64\": \"$AUDIO_B64\"}"
128
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
 
130
+ ## Supported Languages
 
 
 
131
 
132
+ | Language | Code |
133
+ |----------|------|
134
+ | English | `English` |
135
+ | Hindi | `Hindi` |
136
+ | Tamil | `Tamil` |
137
+ | Malayalam | `Malayalam` |
138
+ | Telugu | `Telugu` |
139
+ | Auto-detect | `Auto` |
140
 
141
+ ## Environment Variables
142
 
143
+ | Variable | Required | Default | Description |
144
+ |----------|----------|---------|-------------|
145
+ | `API_KEY` | **Yes** | — | API authentication key |
146
+ | `MODEL_NAME` | No | `shivam-2211/voice-detection-model` | HuggingFace model ID |
147
+ | `MODEL_LOGIT_TEMPERATURE` | No | `1.5` | Softmax temperature scaling |
148
+ | `SESSION_STORE_BACKEND` | No | `redis` | Session backend (`memory` or `redis`) |
149
+ | `REDIS_URL` | No | — | Redis connection URL |
150
+ | `LLM_SEMANTIC_ENABLED` | No | `false` | Enable LLM semantic verifier |
151
+ | `PORT` | No | `7860` | Server port |
152
 
153
+ ## Deployment
154
 
155
+ The API is deployed on **HuggingFace Spaces** using Docker:
156
 
157
+ - **Live URL**: `https://shivam-2211-voice-detection-api.hf.space`
158
+ - **Health Check**: `GET /health`
159
+ - **Infrastructure**: CPU inference, 2 Uvicorn workers, Redis session store
160
 
161
+ ## Project Structure
162
 
163
+ ```
164
+ ├── main.py # FastAPI app, all endpoints, error handling
165
+ ├── model.py # Wav2Vec2 inference + signal forensics engine
166
+ ├── audio_utils.py # Base64 decoding, audio validation, loading
167
+ ├── config.py # Pydantic Settings (env-based configuration)
168
+ ├── speech_to_text.py # Faster-Whisper ASR integration
169
+ ├── fraud_language.py # Fraud language pattern detection
170
+ ├── privacy_utils.py # PII redaction utilities
171
+ ├── Dockerfile # Production Docker image
172
+ ├── requirements.txt # Python dependencies
173
+ └── tests/ # Test suite
174
+ ```
175
 
176
+ ## License
177
 
178
+ MIT
audio_utils.py CHANGED
@@ -8,6 +8,8 @@ import os
8
  import logging
9
  from typing import Tuple, Optional
10
  import numpy as np
 
 
11
 
12
  # Configure logging
13
  logger = logging.getLogger(__name__)
@@ -113,9 +115,6 @@ def load_audio_from_bytes(audio_bytes: bytes, target_sr: int = 22050, audio_form
113
 
114
  tmp_path = None
115
  try:
116
- import librosa
117
- import soundfile as sf
118
-
119
  # Normalize format
120
  audio_format = audio_format.lower().strip()
121
  if audio_format.startswith("."):
@@ -153,31 +152,3 @@ def load_audio_from_bytes(audio_bytes: bytes, target_sr: int = 22050, audio_form
153
  pass # Best effort cleanup
154
 
155
 
156
- def get_audio_duration(audio: np.ndarray, sr: int) -> float:
157
- """
158
- Calculate the duration of audio in seconds.
159
-
160
- Args:
161
- audio: Audio waveform
162
- sr: Sample rate
163
-
164
- Returns:
165
- Duration in seconds
166
- """
167
- return len(audio) / sr
168
-
169
-
170
- def normalize_audio(audio: np.ndarray) -> np.ndarray:
171
- """
172
- Normalize audio to have maximum amplitude of 1.0.
173
-
174
- Args:
175
- audio: Audio waveform
176
-
177
- Returns:
178
- Normalized audio
179
- """
180
- max_val = np.max(np.abs(audio))
181
- if max_val > 0:
182
- return audio / max_val
183
- return audio
 
8
  import logging
9
  from typing import Tuple, Optional
10
  import numpy as np
11
+ import librosa
12
+ import soundfile as sf
13
 
14
  # Configure logging
15
  logger = logging.getLogger(__name__)
 
115
 
116
  tmp_path = None
117
  try:
 
 
 
118
  # Normalize format
119
  audio_format = audio_format.lower().strip()
120
  if audio_format.startswith("."):
 
152
  pass # Best effort cleanup
153
 
154
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
evaluation_results.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "finalScore": 100,
3
+ "totalFiles": 5,
4
+ "scorePerFile": 20.0,
5
+ "successfulClassifications": 5,
6
+ "wrongClassifications": 0,
7
+ "failedTests": 0,
8
+ "fileResults": [
9
+ {
10
+ "fileIndex": 0,
11
+ "status": "success",
12
+ "matched": true,
13
+ "score": 20.0,
14
+ "actualClassification": "AI_GENERATED",
15
+ "confidenceScore": 0.99
16
+ },
17
+ {
18
+ "fileIndex": 1,
19
+ "status": "success",
20
+ "matched": true,
21
+ "score": 20.0,
22
+ "actualClassification": "HUMAN",
23
+ "confidenceScore": 0.99
24
+ },
25
+ {
26
+ "fileIndex": 2,
27
+ "status": "success",
28
+ "matched": true,
29
+ "score": 20.0,
30
+ "actualClassification": "AI_GENERATED",
31
+ "confidenceScore": 0.99
32
+ },
33
+ {
34
+ "fileIndex": 3,
35
+ "status": "success",
36
+ "matched": true,
37
+ "score": 20.0,
38
+ "actualClassification": "HUMAN",
39
+ "confidenceScore": 0.99
40
+ },
41
+ {
42
+ "fileIndex": 4,
43
+ "status": "success",
44
+ "matched": true,
45
+ "score": 20.0,
46
+ "actualClassification": "AI_GENERATED",
47
+ "confidenceScore": 0.99
48
+ }
49
+ ]
50
+ }
main.py CHANGED
@@ -17,9 +17,11 @@ from datetime import datetime, timezone
17
  from typing import Optional, Any, Dict, List
18
  from contextlib import asynccontextmanager
19
  import numpy as np
20
- from fastapi import FastAPI, HTTPException, Request, Depends, WebSocket, WebSocketDisconnect
21
  from fastapi.middleware.cors import CORSMiddleware
22
- from fastapi.responses import JSONResponse
 
 
23
  from pydantic import BaseModel, Field, field_validator, ValidationError
24
  from slowapi import Limiter, _rate_limit_exceeded_handler
25
  from slowapi.util import get_remote_address
@@ -353,8 +355,6 @@ async def lifespan(app: FastAPI):
353
  logger.info("Shutting down...")
354
 
355
 
356
- from fastapi.responses import RedirectResponse
357
-
358
  # Initialize FastAPI app with lifespan
359
  app = FastAPI(
360
  title="AI Voice Detection API",
@@ -1737,8 +1737,6 @@ def session_to_summary(session: SessionState) -> SessionSummaryResponse:
1737
 
1738
 
1739
  # Authentication
1740
- from fastapi.security import APIKeyHeader
1741
- from fastapi import Security
1742
 
1743
  api_key_header = APIKeyHeader(name="x-api-key", auto_error=False) # Changed to False for better error messages
1744
 
@@ -2152,7 +2150,6 @@ async def detect_voice(
2152
 
2153
 
2154
  # Exception handlers
2155
- from fastapi.exceptions import RequestValidationError
2156
 
2157
  def to_json_safe(value: Any) -> Any:
2158
  """Recursively convert values to JSON-safe primitives."""
 
17
  from typing import Optional, Any, Dict, List
18
  from contextlib import asynccontextmanager
19
  import numpy as np
20
+ from fastapi import FastAPI, HTTPException, Request, Depends, WebSocket, WebSocketDisconnect, Security
21
  from fastapi.middleware.cors import CORSMiddleware
22
+ from fastapi.responses import JSONResponse, RedirectResponse
23
+ from fastapi.security import APIKeyHeader
24
+ from fastapi.exceptions import RequestValidationError
25
  from pydantic import BaseModel, Field, field_validator, ValidationError
26
  from slowapi import Limiter, _rate_limit_exceeded_handler
27
  from slowapi.util import get_remote_address
 
355
  logger.info("Shutting down...")
356
 
357
 
 
 
358
  # Initialize FastAPI app with lifespan
359
  app = FastAPI(
360
  title="AI Voice Detection API",
 
1737
 
1738
 
1739
  # Authentication
 
 
1740
 
1741
  api_key_header = APIKeyHeader(name="x-api-key", auto_error=False) # Changed to False for better error messages
1742
 
 
2150
 
2151
 
2152
  # Exception handlers
 
2153
 
2154
  def to_json_safe(value: Any) -> Any:
2155
  """Recursively convert values to JSON-safe primitives."""
model.py CHANGED
@@ -5,6 +5,9 @@ Combines Wav2Vec2 deepfake detection with signal forensics.
5
  import logging
6
  import os
7
  import numpy as np
 
 
 
8
  from typing import Dict, Tuple, List, Optional
9
  from dataclasses import dataclass
10
  import warnings
@@ -57,7 +60,6 @@ def get_device():
57
  """Get the best available device (GPU or CPU)."""
58
  global _device
59
  if _device is None:
60
- import torch
61
  if torch.cuda.is_available():
62
  _device = "cuda"
63
  else:
@@ -136,8 +138,6 @@ def load_model():
136
 
137
  def extract_signal_features(audio: np.ndarray, sr: int, fast_mode: bool = False) -> Dict[str, float]:
138
  """Extract signal-based features (pitch, entropy, silence)."""
139
- import librosa
140
- from scipy.stats import entropy
141
 
142
  features = {}
143
 
@@ -475,9 +475,6 @@ def classify_with_model(audio: np.ndarray, sr: int) -> Tuple[str, float]:
475
  Returns:
476
  Tuple of (classification, confidence)
477
  """
478
- import torch
479
- import librosa
480
-
481
  model, processor = load_model()
482
  device = get_device()
483
 
 
5
  import logging
6
  import os
7
  import numpy as np
8
+ import librosa
9
+ import torch
10
+ from scipy.stats import entropy
11
  from typing import Dict, Tuple, List, Optional
12
  from dataclasses import dataclass
13
  import warnings
 
60
  """Get the best available device (GPU or CPU)."""
61
  global _device
62
  if _device is None:
 
63
  if torch.cuda.is_available():
64
  _device = "cuda"
65
  else:
 
138
 
139
  def extract_signal_features(audio: np.ndarray, sr: int, fast_mode: bool = False) -> Dict[str, float]:
140
  """Extract signal-based features (pitch, entropy, silence)."""
 
 
141
 
142
  features = {}
143
 
 
475
  Returns:
476
  Tuple of (classification, confidence)
477
  """
 
 
 
478
  model, processor = load_model()
479
  device = get_device()
480
 
requirements.txt CHANGED
@@ -8,16 +8,10 @@ scipy>=1.10.0
8
  python-dotenv
9
  pydantic>=2.0.0
10
  transformers>=4.30.0
11
- datasets>=2.14.0
12
- scikit-learn>=1.3.0
13
- accelerate>=0.20.0
14
  slowapi>=0.1.9
15
  pydantic-settings>=2.0.0
16
  httpx>=0.27.0
17
- # PyTorch - install manually for your platform if not using Docker:
18
- # pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
19
- torch>=2.0.0
20
- torchaudio>=2.0.0
21
  faster-whisper>=1.0.3
22
-
23
  redis>=5.0.0
 
 
 
8
  python-dotenv
9
  pydantic>=2.0.0
10
  transformers>=4.30.0
 
 
 
11
  slowapi>=0.1.9
12
  pydantic-settings>=2.0.0
13
  httpx>=0.27.0
 
 
 
 
14
  faster-whisper>=1.0.3
 
15
  redis>=5.0.0
16
+ # PyTorch CPU — installed separately in Dockerfile for smaller image.
17
+ # For local dev: pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
run_final_tests.py DELETED
@@ -1,44 +0,0 @@
1
- """Final hackathon test: all 5 files against legacy POST /api/voice-detection"""
2
- import base64, json, time, requests
3
-
4
- DIR = r"c:\Users\shiva\OneDrive\Desktop\Voice Project\voice-detection-api\drive-download-20260216T053632Z-1-001"
5
- URL = "http://localhost:7860/api/voice-detection"
6
- HEADERS = {"Content-Type": "application/json", "x-api-key": "sk_test_voice_detection_2026"}
7
-
8
- FILES = [
9
- ("English_voice_AI_GENERATED.mp3", "English", "AI_GENERATED"),
10
- ("Hindi_Voice_HUMAN.mp3", "Hindi", "HUMAN"),
11
- ("Malayalam_AI_GENERATED.mp3", "Malayalam", "AI_GENERATED"),
12
- ("TAMIL_VOICE__HUMAN.mp3", "Tamil", "HUMAN"),
13
- ("Telugu_Voice_AI_GENERATED.mp3", "Telugu", "AI_GENERATED"),
14
- ]
15
-
16
- print("=" * 90)
17
- print(f"{'File':<42} {'Expected':<16} {'Got':<16} {'Conf':>6} Result")
18
- print("=" * 90)
19
-
20
- passed = 0
21
- for fname, lang, expected in FILES:
22
- with open(f"{DIR}\\{fname}", "rb") as f:
23
- b64 = base64.b64encode(f.read()).decode()
24
- payload = {"audioBase64": b64, "language": lang, "audioFormat": "mp3"}
25
- t0 = time.time()
26
- try:
27
- r = requests.post(URL, json=payload, headers=HEADERS, timeout=30)
28
- elapsed = time.time() - t0
29
- d = r.json()
30
- cls = d.get("classification", "?")
31
- conf = d.get("confidenceScore", "?")
32
- ok = cls == expected
33
- if ok:
34
- passed += 1
35
- tag = "PASS" if ok else "FAIL"
36
- print(f"{fname:<42} {expected:<16} {cls:<16} {conf:>6} {tag} ({elapsed:.1f}s)")
37
- except Exception as e:
38
- elapsed = time.time() - t0
39
- print(f"{fname:<42} {expected:<16} {'ERROR':<16} {'--':>6} FAIL ({elapsed:.1f}s) {e}")
40
- # small pause between requests to avoid CPU thermal throttle
41
- time.sleep(2)
42
-
43
- print("=" * 90)
44
- print(f"Result: {passed}/{len(FILES)} passed")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
test_my_api.py ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Official evaluation script from the hackathon guide, configured with our 5 test files.
3
+ This mirrors EXACTLY what the evaluator will run.
4
+ """
5
+ import requests
6
+ import base64
7
+ import json
8
+
9
+ def evaluate_voice_detection_api(endpoint_url, api_key, test_files):
10
+ if not endpoint_url:
11
+ print("Error: Endpoint URL is required")
12
+ return False
13
+ if not test_files or len(test_files) == 0:
14
+ print("Error: No test files provided")
15
+ return False
16
+
17
+ total_files = len(test_files)
18
+ score_per_file = 100 / total_files
19
+ total_score = 0
20
+ file_results = []
21
+
22
+ print(f"\n{'='*60}")
23
+ print(f"Starting Evaluation")
24
+ print(f"{'='*60}")
25
+ print(f"Endpoint: {endpoint_url}")
26
+ print(f"Total Test Files: {total_files}")
27
+ print(f"Score per File: {score_per_file:.2f}")
28
+ print(f"{'='*60}\n")
29
+
30
+ for idx, file_data in enumerate(test_files):
31
+ language = file_data.get('language', 'English')
32
+ file_path = file_data.get('file_path', '')
33
+ expected_classification = file_data.get('expected_classification', '')
34
+
35
+ print(f"Test {idx + 1}/{total_files}: {file_path}")
36
+
37
+ if not file_path or not expected_classification:
38
+ file_results.append({'fileIndex': idx, 'status': 'skipped', 'score': 0})
39
+ print(f" Skipped: Missing file path or expected classification\n")
40
+ continue
41
+
42
+ try:
43
+ with open(file_path, 'rb') as audio_file:
44
+ audio_base64 = base64.b64encode(audio_file.read()).decode('utf-8')
45
+ except Exception as e:
46
+ file_results.append({'fileIndex': idx, 'status': 'failed', 'message': f'Failed to read: {e}', 'score': 0})
47
+ print(f" Failed to read file: {e}\n")
48
+ continue
49
+
50
+ headers = {'Content-Type': 'application/json', 'x-api-key': api_key}
51
+ request_body = {'language': language, 'audioFormat': 'mp3', 'audioBase64': audio_base64}
52
+
53
+ try:
54
+ response = requests.post(endpoint_url, headers=headers, json=request_body, timeout=30)
55
+
56
+ if response.status_code != 200:
57
+ file_results.append({'fileIndex': idx, 'status': 'failed', 'message': f'HTTP {response.status_code}', 'score': 0})
58
+ print(f" HTTP Status: {response.status_code}")
59
+ print(f" Response: {response.text[:200]}\n")
60
+ continue
61
+
62
+ response_data = response.json()
63
+
64
+ if not isinstance(response_data, dict):
65
+ file_results.append({'fileIndex': idx, 'status': 'failed', 'message': 'Not a JSON object', 'score': 0})
66
+ print(f" Invalid response type\n")
67
+ continue
68
+
69
+ response_status = response_data.get('status', '')
70
+ response_classification = response_data.get('classification', '')
71
+ confidence_score = response_data.get('confidenceScore', None)
72
+
73
+ if not response_status or not response_classification or confidence_score is None:
74
+ file_results.append({'fileIndex': idx, 'status': 'failed', 'message': 'Missing required fields', 'score': 0})
75
+ print(f" Missing required fields")
76
+ print(f" Response: {json.dumps(response_data, indent=2)[:200]}\n")
77
+ continue
78
+
79
+ if response_status != 'success':
80
+ file_results.append({'fileIndex': idx, 'status': 'failed', 'message': f'Status: {response_status}', 'score': 0})
81
+ print(f" Status not 'success': {response_status}\n")
82
+ continue
83
+
84
+ if not isinstance(confidence_score, (int, float)) or confidence_score < 0 or confidence_score > 1:
85
+ file_results.append({'fileIndex': idx, 'status': 'failed', 'message': f'Invalid confidence: {confidence_score}', 'score': 0})
86
+ print(f" Invalid confidence score: {confidence_score}\n")
87
+ continue
88
+
89
+ valid_classifications = ['HUMAN', 'AI_GENERATED']
90
+ if response_classification not in valid_classifications:
91
+ file_results.append({'fileIndex': idx, 'status': 'failed', 'message': f'Invalid classification: {response_classification}', 'score': 0})
92
+ print(f" Invalid classification: {response_classification}\n")
93
+ continue
94
+
95
+ # Score calculation
96
+ file_score = 0
97
+ if response_classification == expected_classification:
98
+ if confidence_score >= 0.8:
99
+ file_score = score_per_file
100
+ confidence_tier = "100%"
101
+ elif confidence_score >= 0.6:
102
+ file_score = score_per_file * 0.75
103
+ confidence_tier = "75%"
104
+ elif confidence_score >= 0.4:
105
+ file_score = score_per_file * 0.5
106
+ confidence_tier = "50%"
107
+ else:
108
+ file_score = score_per_file * 0.25
109
+ confidence_tier = "25%"
110
+ total_score += file_score
111
+ file_results.append({'fileIndex': idx, 'status': 'success', 'matched': True, 'score': round(file_score, 2),
112
+ 'actualClassification': response_classification, 'confidenceScore': confidence_score})
113
+ print(f" CORRECT: {response_classification}")
114
+ print(f" Confidence: {confidence_score:.2f} -> {confidence_tier} of points")
115
+ print(f" Score: {file_score:.2f}/{score_per_file:.2f}\n")
116
+ else:
117
+ file_results.append({'fileIndex': idx, 'status': 'success', 'matched': False, 'score': 0,
118
+ 'actualClassification': response_classification, 'confidenceScore': confidence_score})
119
+ print(f" WRONG: {response_classification} (Expected: {expected_classification})")
120
+ print(f" Score: 0/{score_per_file:.2f}\n")
121
+
122
+ except requests.exceptions.Timeout:
123
+ file_results.append({'fileIndex': idx, 'status': 'failed', 'message': 'Timeout (>30s)', 'score': 0})
124
+ print(f" TIMEOUT: Request took longer than 30 seconds\n")
125
+ except requests.exceptions.ConnectionError:
126
+ file_results.append({'fileIndex': idx, 'status': 'failed', 'message': 'Connection error', 'score': 0})
127
+ print(f" CONNECTION ERROR\n")
128
+ except Exception as e:
129
+ file_results.append({'fileIndex': idx, 'status': 'failed', 'message': str(e), 'score': 0})
130
+ print(f" ERROR: {e}\n")
131
+
132
+ final_score = round(total_score)
133
+
134
+ print(f"{'='*60}")
135
+ print(f"EVALUATION SUMMARY")
136
+ print(f"{'='*60}")
137
+ print(f"Total Files Tested: {total_files}")
138
+ print(f"Final Score: {final_score}/100")
139
+ print(f"{'='*60}\n")
140
+
141
+ successful = sum(1 for r in file_results if r.get('matched', False))
142
+ failed = sum(1 for r in file_results if r['status'] == 'failed')
143
+ wrong = sum(1 for r in file_results if r['status'] == 'success' and not r.get('matched', False))
144
+
145
+ print(f"Correct Classifications: {successful}/{total_files}")
146
+ print(f"Wrong Classifications: {wrong}/{total_files}")
147
+ print(f"Failed/Errors: {failed}/{total_files}\n")
148
+
149
+ with open('evaluation_results.json', 'w') as f:
150
+ json.dump({'finalScore': final_score, 'totalFiles': total_files, 'scorePerFile': round(score_per_file, 2),
151
+ 'successfulClassifications': successful, 'wrongClassifications': wrong, 'failedTests': failed,
152
+ 'fileResults': file_results}, f, indent=2)
153
+ print(f"Detailed results saved to: evaluation_results.json\n")
154
+ return True
155
+
156
+
157
+ if __name__ == '__main__':
158
+ ENDPOINT_URL = 'https://shivam-2211-voice-detection-api.hf.space/api/voice-detection'
159
+ API_KEY = 'sk_test_voice_detection_2026'
160
+
161
+ DIR = r'c:\Users\shiva\OneDrive\Desktop\Voice Project\voice-detection-api\drive-download-20260216T053632Z-1-001'
162
+
163
+ TEST_FILES = [
164
+ {'language': 'English', 'file_path': f'{DIR}\\English_voice_AI_GENERATED.mp3', 'expected_classification': 'AI_GENERATED'},
165
+ {'language': 'Hindi', 'file_path': f'{DIR}\\Hindi_Voice_HUMAN.mp3', 'expected_classification': 'HUMAN'},
166
+ {'language': 'Malayalam','file_path': f'{DIR}\\Malayalam_AI_GENERATED.mp3', 'expected_classification': 'AI_GENERATED'},
167
+ {'language': 'Tamil', 'file_path': f'{DIR}\\TAMIL_VOICE__HUMAN.mp3', 'expected_classification': 'HUMAN'},
168
+ {'language': 'Telugu', 'file_path': f'{DIR}\\Telugu_Voice_AI_GENERATED.mp3', 'expected_classification': 'AI_GENERATED'},
169
+ ]
170
+
171
+ evaluate_voice_detection_api(ENDPOINT_URL, API_KEY, TEST_FILES)