Spaces:
Sleeping
Sleeping
shivam0897-i commited on
Commit ·
8a6ab53
1
Parent(s): 395a38b
perf: optimize deps, imports, Dockerfile; improve README for judges
Browse files- .dockerignore +46 -0
- Dockerfile +2 -7
- README.md +134 -49
- audio_utils.py +2 -31
- evaluation_results.json +50 -0
- main.py +4 -7
- model.py +3 -6
- requirements.txt +2 -8
- run_final_tests.py +0 -44
- test_my_api.py +171 -0
.dockerignore
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Tests and test data
|
| 2 |
+
tests/
|
| 3 |
+
_test_*.py
|
| 4 |
+
test_*.py
|
| 5 |
+
test_*.json
|
| 6 |
+
pytest.ini
|
| 7 |
+
drive-download-*/
|
| 8 |
+
run_final_tests.py
|
| 9 |
+
test_my_api.py
|
| 10 |
+
|
| 11 |
+
# Documentation (not needed at runtime)
|
| 12 |
+
docs/
|
| 13 |
+
*.md
|
| 14 |
+
!README.md
|
| 15 |
+
|
| 16 |
+
# Training artifacts
|
| 17 |
+
training/
|
| 18 |
+
*.ipynb
|
| 19 |
+
|
| 20 |
+
# Python caches
|
| 21 |
+
__pycache__/
|
| 22 |
+
*.pyc
|
| 23 |
+
*.pyo
|
| 24 |
+
|
| 25 |
+
# IDE and OS files
|
| 26 |
+
.vscode/
|
| 27 |
+
.idea/
|
| 28 |
+
*.swp
|
| 29 |
+
.DS_Store
|
| 30 |
+
Thumbs.db
|
| 31 |
+
|
| 32 |
+
# Scripts
|
| 33 |
+
scripts/
|
| 34 |
+
|
| 35 |
+
# Analysis results
|
| 36 |
+
*.json
|
| 37 |
+
!test_request.json
|
| 38 |
+
!test_valid.json
|
| 39 |
+
|
| 40 |
+
# Git
|
| 41 |
+
.git/
|
| 42 |
+
.gitignore
|
| 43 |
+
|
| 44 |
+
# Env files
|
| 45 |
+
.env
|
| 46 |
+
.env.*
|
Dockerfile
CHANGED
|
@@ -6,13 +6,8 @@ WORKDIR /app
|
|
| 6 |
RUN apt-get update && apt-get install -y \
|
| 7 |
libsndfile1 \
|
| 8 |
ffmpeg \
|
| 9 |
-
git \
|
| 10 |
-
git-lfs \
|
| 11 |
&& rm -rf /var/lib/apt/lists/*
|
| 12 |
|
| 13 |
-
# Initialize git lfs
|
| 14 |
-
RUN git lfs install
|
| 15 |
-
|
| 16 |
# Copy requirements first for better caching
|
| 17 |
COPY requirements.txt .
|
| 18 |
|
|
@@ -36,5 +31,5 @@ WORKDIR /app
|
|
| 36 |
# Hugging Face Spaces uses port 7860
|
| 37 |
EXPOSE 7860
|
| 38 |
|
| 39 |
-
# Run the application
|
| 40 |
-
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860"]
|
|
|
|
| 6 |
RUN apt-get update && apt-get install -y \
|
| 7 |
libsndfile1 \
|
| 8 |
ffmpeg \
|
|
|
|
|
|
|
| 9 |
&& rm -rf /var/lib/apt/lists/*
|
| 10 |
|
|
|
|
|
|
|
|
|
|
| 11 |
# Copy requirements first for better caching
|
| 12 |
COPY requirements.txt .
|
| 13 |
|
|
|
|
| 31 |
# Hugging Face Spaces uses port 7860
|
| 32 |
EXPOSE 7860
|
| 33 |
|
| 34 |
+
# Run the application (2 workers for concurrent request handling)
|
| 35 |
+
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "2"]
|
README.md
CHANGED
|
@@ -11,16 +11,93 @@ app_port: 7860
|
|
| 11 |
|
| 12 |
# AI Voice Detection API
|
| 13 |
|
| 14 |
-
Detects whether a voice sample is AI-generated or spoken by a real human using a fine-tuned Wav2Vec2 model.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
## API Endpoint
|
| 17 |
|
| 18 |
-
`POST /api/voice-detection`
|
| 19 |
|
| 20 |
-
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
-
|
| 24 |
```json
|
| 25 |
{
|
| 26 |
"language": "English",
|
|
@@ -29,65 +106,73 @@ Detects whether a voice sample is AI-generated or spoken by a real human using a
|
|
| 29 |
}
|
| 30 |
```
|
| 31 |
|
| 32 |
-
|
| 33 |
```json
|
| 34 |
{
|
| 35 |
"status": "success",
|
| 36 |
"language": "English",
|
| 37 |
-
"classification": "AI_GENERATED"
|
| 38 |
-
"confidenceScore": 0.
|
| 39 |
-
"explanation": "AI voice indicators
|
| 40 |
}
|
| 41 |
```
|
| 42 |
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
-
|
| 47 |
-
-
|
| 48 |
-
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
## Realtime Session APIs
|
| 53 |
-
|
| 54 |
-
The backend also supports session-based realtime analysis:
|
| 55 |
-
|
| 56 |
-
- `POST /v1/session/start`
|
| 57 |
-
- `POST /v1/session/{session_id}/chunk`
|
| 58 |
-
- `GET /v1/session/{session_id}/summary`
|
| 59 |
-
- `GET /v1/session/{session_id}/alerts`
|
| 60 |
-
- `POST /v1/session/{session_id}/end`
|
| 61 |
-
|
| 62 |
-
Compatibility aliases are available under `/api/voice-detection/v1/...`.
|
| 63 |
-
|
| 64 |
-
## Optional LLM Semantic Verifier
|
| 65 |
-
|
| 66 |
-
A second-layer semantic verifier can be enabled to improve ambiguous chunk scoring:
|
| 67 |
|
| 68 |
-
|
| 69 |
-
- `LLM_PROVIDER=openai` with `OPENAI_API_KEY=<your_key>`, or
|
| 70 |
-
- `LLM_PROVIDER=gemini` with `GEMINI_API_KEY=<your_key>`
|
| 71 |
-
- Tune with `LLM_SEMANTIC_*` env variables in `.env.example`.
|
| 72 |
|
| 73 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
-
|
| 76 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
|
| 78 |
-
##
|
| 79 |
|
| 80 |
-
|
| 81 |
|
| 82 |
-
-
|
| 83 |
-
-
|
|
|
|
| 84 |
|
| 85 |
-
|
| 86 |
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
|
| 91 |
-
|
| 92 |
|
| 93 |
-
|
|
|
|
| 11 |
|
| 12 |
# AI Voice Detection API
|
| 13 |
|
| 14 |
+
Detects whether a voice sample is **AI-generated** or spoken by a **real human** using a fine-tuned Wav2Vec2 transformer model combined with multi-signal forensic analysis.
|
| 15 |
+
|
| 16 |
+
## Model Architecture
|
| 17 |
+
|
| 18 |
+
```
|
| 19 |
+
Audio Input (Base64 MP3/WAV)
|
| 20 |
+
│
|
| 21 |
+
▼
|
| 22 |
+
┌─────────────────────┐
|
| 23 |
+
│ Audio Preprocessing │ librosa 16 kHz mono, normalization
|
| 24 |
+
└────────┬────────────┘
|
| 25 |
+
│
|
| 26 |
+
┌────┴────┐
|
| 27 |
+
▼ ▼
|
| 28 |
+
┌────────┐ ┌──────────────────┐
|
| 29 |
+
│Wav2Vec2│ │ Signal Forensics │
|
| 30 |
+
│ Model │ │ (4 dimensions) │
|
| 31 |
+
└───┬────┘ └───────┬──────────┘
|
| 32 |
+
│ │
|
| 33 |
+
▼ ▼
|
| 34 |
+
Softmax ┌─────────────┐
|
| 35 |
+
Confidence │ Pitch │
|
| 36 |
+
│ │ Spectral │
|
| 37 |
+
│ │ Temporal │
|
| 38 |
+
│ │ Authenticity│
|
| 39 |
+
│ └──────┬──────┘
|
| 40 |
+
└───────┬───────┘
|
| 41 |
+
▼
|
| 42 |
+
Final Classification
|
| 43 |
+
(HUMAN / AI_GENERATED)
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
### Key Components
|
| 47 |
+
|
| 48 |
+
| Component | Description |
|
| 49 |
+
|-----------|-------------|
|
| 50 |
+
| **ML Backbone** | [Wav2Vec2ForSequenceClassification](https://huggingface.co/shivam-2211/voice-detection-model) fine-tuned on human vs. AI-generated speech |
|
| 51 |
+
| **Temperature Scaling** | Logits scaled by T=1.5 before softmax for well-calibrated confidence scores |
|
| 52 |
+
| **Signal Forensics** | Pitch stability, spectral entropy, temporal rhythm, and acoustic anomaly detection |
|
| 53 |
+
| **ASR Integration** | Faster-Whisper (tiny, int8) for language detection and transcript extraction |
|
| 54 |
+
| **Timeout Safety** | 20-second budget with audio truncation to guarantee <30s response |
|
| 55 |
+
|
| 56 |
+
## Quick Start
|
| 57 |
+
|
| 58 |
+
### Prerequisites
|
| 59 |
+
|
| 60 |
+
- Python 3.10+
|
| 61 |
+
- FFmpeg (`apt-get install ffmpeg` or `brew install ffmpeg`)
|
| 62 |
+
|
| 63 |
+
### Local Setup
|
| 64 |
+
|
| 65 |
+
```bash
|
| 66 |
+
# Clone the repository
|
| 67 |
+
git clone https://github.com/shivam0897-i/voice_backend.git
|
| 68 |
+
cd voice_backend
|
| 69 |
+
|
| 70 |
+
# Install CPU-only PyTorch
|
| 71 |
+
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
|
| 72 |
+
|
| 73 |
+
# Install dependencies
|
| 74 |
+
pip install -r requirements.txt
|
| 75 |
+
|
| 76 |
+
# Set your API key
|
| 77 |
+
echo "API_KEY=your_secret_key" > .env
|
| 78 |
+
|
| 79 |
+
# Run the server
|
| 80 |
+
uvicorn main:app --host 0.0.0.0 --port 7860
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
### Docker
|
| 84 |
+
|
| 85 |
+
```bash
|
| 86 |
+
docker build -t voice-detection-api .
|
| 87 |
+
docker run -p 7860:7860 -e API_KEY=your_secret_key voice-detection-api
|
| 88 |
+
```
|
| 89 |
|
| 90 |
## API Endpoint
|
| 91 |
|
| 92 |
+
### `POST /api/voice-detection`
|
| 93 |
|
| 94 |
+
**Headers:**
|
| 95 |
+
| Header | Description |
|
| 96 |
+
|--------|-------------|
|
| 97 |
+
| `Content-Type` | `application/json` |
|
| 98 |
+
| `x-api-key` | Your API key (set via `API_KEY` env var) |
|
| 99 |
|
| 100 |
+
**Request Body:**
|
| 101 |
```json
|
| 102 |
{
|
| 103 |
"language": "English",
|
|
|
|
| 106 |
}
|
| 107 |
```
|
| 108 |
|
| 109 |
+
**Response (200 OK):**
|
| 110 |
```json
|
| 111 |
{
|
| 112 |
"status": "success",
|
| 113 |
"language": "English",
|
| 114 |
+
"classification": "AI_GENERATED",
|
| 115 |
+
"confidenceScore": 0.99,
|
| 116 |
+
"explanation": "AI voice indicators detected with high confidence..."
|
| 117 |
}
|
| 118 |
```
|
| 119 |
|
| 120 |
+
**Example with curl:**
|
| 121 |
+
```bash
|
| 122 |
+
# Encode audio to Base64 and send
|
| 123 |
+
AUDIO_B64=$(base64 -w0 sample.mp3)
|
| 124 |
+
curl -X POST https://shivam-2211-voice-detection-api.hf.space/api/voice-detection \
|
| 125 |
+
-H "Content-Type: application/json" \
|
| 126 |
+
-H "x-api-key: YOUR_KEY" \
|
| 127 |
+
-d "{\"language\": \"English\", \"audioFormat\": \"mp3\", \"audioBase64\": \"$AUDIO_B64\"}"
|
| 128 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
|
| 130 |
+
## Supported Languages
|
|
|
|
|
|
|
|
|
|
| 131 |
|
| 132 |
+
| Language | Code |
|
| 133 |
+
|----------|------|
|
| 134 |
+
| English | `English` |
|
| 135 |
+
| Hindi | `Hindi` |
|
| 136 |
+
| Tamil | `Tamil` |
|
| 137 |
+
| Malayalam | `Malayalam` |
|
| 138 |
+
| Telugu | `Telugu` |
|
| 139 |
+
| Auto-detect | `Auto` |
|
| 140 |
|
| 141 |
+
## Environment Variables
|
| 142 |
|
| 143 |
+
| Variable | Required | Default | Description |
|
| 144 |
+
|----------|----------|---------|-------------|
|
| 145 |
+
| `API_KEY` | **Yes** | — | API authentication key |
|
| 146 |
+
| `MODEL_NAME` | No | `shivam-2211/voice-detection-model` | HuggingFace model ID |
|
| 147 |
+
| `MODEL_LOGIT_TEMPERATURE` | No | `1.5` | Softmax temperature scaling |
|
| 148 |
+
| `SESSION_STORE_BACKEND` | No | `redis` | Session backend (`memory` or `redis`) |
|
| 149 |
+
| `REDIS_URL` | No | — | Redis connection URL |
|
| 150 |
+
| `LLM_SEMANTIC_ENABLED` | No | `false` | Enable LLM semantic verifier |
|
| 151 |
+
| `PORT` | No | `7860` | Server port |
|
| 152 |
|
| 153 |
+
## Deployment
|
| 154 |
|
| 155 |
+
The API is deployed on **HuggingFace Spaces** using Docker:
|
| 156 |
|
| 157 |
+
- **Live URL**: `https://shivam-2211-voice-detection-api.hf.space`
|
| 158 |
+
- **Health Check**: `GET /health`
|
| 159 |
+
- **Infrastructure**: CPU inference, 2 Uvicorn workers, Redis session store
|
| 160 |
|
| 161 |
+
## Project Structure
|
| 162 |
|
| 163 |
+
```
|
| 164 |
+
├── main.py # FastAPI app, all endpoints, error handling
|
| 165 |
+
├── model.py # Wav2Vec2 inference + signal forensics engine
|
| 166 |
+
├── audio_utils.py # Base64 decoding, audio validation, loading
|
| 167 |
+
├── config.py # Pydantic Settings (env-based configuration)
|
| 168 |
+
├── speech_to_text.py # Faster-Whisper ASR integration
|
| 169 |
+
├── fraud_language.py # Fraud language pattern detection
|
| 170 |
+
├── privacy_utils.py # PII redaction utilities
|
| 171 |
+
├── Dockerfile # Production Docker image
|
| 172 |
+
├── requirements.txt # Python dependencies
|
| 173 |
+
└── tests/ # Test suite
|
| 174 |
+
```
|
| 175 |
|
| 176 |
+
## License
|
| 177 |
|
| 178 |
+
MIT
|
audio_utils.py
CHANGED
|
@@ -8,6 +8,8 @@ import os
|
|
| 8 |
import logging
|
| 9 |
from typing import Tuple, Optional
|
| 10 |
import numpy as np
|
|
|
|
|
|
|
| 11 |
|
| 12 |
# Configure logging
|
| 13 |
logger = logging.getLogger(__name__)
|
|
@@ -113,9 +115,6 @@ def load_audio_from_bytes(audio_bytes: bytes, target_sr: int = 22050, audio_form
|
|
| 113 |
|
| 114 |
tmp_path = None
|
| 115 |
try:
|
| 116 |
-
import librosa
|
| 117 |
-
import soundfile as sf
|
| 118 |
-
|
| 119 |
# Normalize format
|
| 120 |
audio_format = audio_format.lower().strip()
|
| 121 |
if audio_format.startswith("."):
|
|
@@ -153,31 +152,3 @@ def load_audio_from_bytes(audio_bytes: bytes, target_sr: int = 22050, audio_form
|
|
| 153 |
pass # Best effort cleanup
|
| 154 |
|
| 155 |
|
| 156 |
-
def get_audio_duration(audio: np.ndarray, sr: int) -> float:
|
| 157 |
-
"""
|
| 158 |
-
Calculate the duration of audio in seconds.
|
| 159 |
-
|
| 160 |
-
Args:
|
| 161 |
-
audio: Audio waveform
|
| 162 |
-
sr: Sample rate
|
| 163 |
-
|
| 164 |
-
Returns:
|
| 165 |
-
Duration in seconds
|
| 166 |
-
"""
|
| 167 |
-
return len(audio) / sr
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
def normalize_audio(audio: np.ndarray) -> np.ndarray:
|
| 171 |
-
"""
|
| 172 |
-
Normalize audio to have maximum amplitude of 1.0.
|
| 173 |
-
|
| 174 |
-
Args:
|
| 175 |
-
audio: Audio waveform
|
| 176 |
-
|
| 177 |
-
Returns:
|
| 178 |
-
Normalized audio
|
| 179 |
-
"""
|
| 180 |
-
max_val = np.max(np.abs(audio))
|
| 181 |
-
if max_val > 0:
|
| 182 |
-
return audio / max_val
|
| 183 |
-
return audio
|
|
|
|
| 8 |
import logging
|
| 9 |
from typing import Tuple, Optional
|
| 10 |
import numpy as np
|
| 11 |
+
import librosa
|
| 12 |
+
import soundfile as sf
|
| 13 |
|
| 14 |
# Configure logging
|
| 15 |
logger = logging.getLogger(__name__)
|
|
|
|
| 115 |
|
| 116 |
tmp_path = None
|
| 117 |
try:
|
|
|
|
|
|
|
|
|
|
| 118 |
# Normalize format
|
| 119 |
audio_format = audio_format.lower().strip()
|
| 120 |
if audio_format.startswith("."):
|
|
|
|
| 152 |
pass # Best effort cleanup
|
| 153 |
|
| 154 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
evaluation_results.json
ADDED
|
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"finalScore": 100,
|
| 3 |
+
"totalFiles": 5,
|
| 4 |
+
"scorePerFile": 20.0,
|
| 5 |
+
"successfulClassifications": 5,
|
| 6 |
+
"wrongClassifications": 0,
|
| 7 |
+
"failedTests": 0,
|
| 8 |
+
"fileResults": [
|
| 9 |
+
{
|
| 10 |
+
"fileIndex": 0,
|
| 11 |
+
"status": "success",
|
| 12 |
+
"matched": true,
|
| 13 |
+
"score": 20.0,
|
| 14 |
+
"actualClassification": "AI_GENERATED",
|
| 15 |
+
"confidenceScore": 0.99
|
| 16 |
+
},
|
| 17 |
+
{
|
| 18 |
+
"fileIndex": 1,
|
| 19 |
+
"status": "success",
|
| 20 |
+
"matched": true,
|
| 21 |
+
"score": 20.0,
|
| 22 |
+
"actualClassification": "HUMAN",
|
| 23 |
+
"confidenceScore": 0.99
|
| 24 |
+
},
|
| 25 |
+
{
|
| 26 |
+
"fileIndex": 2,
|
| 27 |
+
"status": "success",
|
| 28 |
+
"matched": true,
|
| 29 |
+
"score": 20.0,
|
| 30 |
+
"actualClassification": "AI_GENERATED",
|
| 31 |
+
"confidenceScore": 0.99
|
| 32 |
+
},
|
| 33 |
+
{
|
| 34 |
+
"fileIndex": 3,
|
| 35 |
+
"status": "success",
|
| 36 |
+
"matched": true,
|
| 37 |
+
"score": 20.0,
|
| 38 |
+
"actualClassification": "HUMAN",
|
| 39 |
+
"confidenceScore": 0.99
|
| 40 |
+
},
|
| 41 |
+
{
|
| 42 |
+
"fileIndex": 4,
|
| 43 |
+
"status": "success",
|
| 44 |
+
"matched": true,
|
| 45 |
+
"score": 20.0,
|
| 46 |
+
"actualClassification": "AI_GENERATED",
|
| 47 |
+
"confidenceScore": 0.99
|
| 48 |
+
}
|
| 49 |
+
]
|
| 50 |
+
}
|
main.py
CHANGED
|
@@ -17,9 +17,11 @@ from datetime import datetime, timezone
|
|
| 17 |
from typing import Optional, Any, Dict, List
|
| 18 |
from contextlib import asynccontextmanager
|
| 19 |
import numpy as np
|
| 20 |
-
from fastapi import FastAPI, HTTPException, Request, Depends, WebSocket, WebSocketDisconnect
|
| 21 |
from fastapi.middleware.cors import CORSMiddleware
|
| 22 |
-
from fastapi.responses import JSONResponse
|
|
|
|
|
|
|
| 23 |
from pydantic import BaseModel, Field, field_validator, ValidationError
|
| 24 |
from slowapi import Limiter, _rate_limit_exceeded_handler
|
| 25 |
from slowapi.util import get_remote_address
|
|
@@ -353,8 +355,6 @@ async def lifespan(app: FastAPI):
|
|
| 353 |
logger.info("Shutting down...")
|
| 354 |
|
| 355 |
|
| 356 |
-
from fastapi.responses import RedirectResponse
|
| 357 |
-
|
| 358 |
# Initialize FastAPI app with lifespan
|
| 359 |
app = FastAPI(
|
| 360 |
title="AI Voice Detection API",
|
|
@@ -1737,8 +1737,6 @@ def session_to_summary(session: SessionState) -> SessionSummaryResponse:
|
|
| 1737 |
|
| 1738 |
|
| 1739 |
# Authentication
|
| 1740 |
-
from fastapi.security import APIKeyHeader
|
| 1741 |
-
from fastapi import Security
|
| 1742 |
|
| 1743 |
api_key_header = APIKeyHeader(name="x-api-key", auto_error=False) # Changed to False for better error messages
|
| 1744 |
|
|
@@ -2152,7 +2150,6 @@ async def detect_voice(
|
|
| 2152 |
|
| 2153 |
|
| 2154 |
# Exception handlers
|
| 2155 |
-
from fastapi.exceptions import RequestValidationError
|
| 2156 |
|
| 2157 |
def to_json_safe(value: Any) -> Any:
|
| 2158 |
"""Recursively convert values to JSON-safe primitives."""
|
|
|
|
| 17 |
from typing import Optional, Any, Dict, List
|
| 18 |
from contextlib import asynccontextmanager
|
| 19 |
import numpy as np
|
| 20 |
+
from fastapi import FastAPI, HTTPException, Request, Depends, WebSocket, WebSocketDisconnect, Security
|
| 21 |
from fastapi.middleware.cors import CORSMiddleware
|
| 22 |
+
from fastapi.responses import JSONResponse, RedirectResponse
|
| 23 |
+
from fastapi.security import APIKeyHeader
|
| 24 |
+
from fastapi.exceptions import RequestValidationError
|
| 25 |
from pydantic import BaseModel, Field, field_validator, ValidationError
|
| 26 |
from slowapi import Limiter, _rate_limit_exceeded_handler
|
| 27 |
from slowapi.util import get_remote_address
|
|
|
|
| 355 |
logger.info("Shutting down...")
|
| 356 |
|
| 357 |
|
|
|
|
|
|
|
| 358 |
# Initialize FastAPI app with lifespan
|
| 359 |
app = FastAPI(
|
| 360 |
title="AI Voice Detection API",
|
|
|
|
| 1737 |
|
| 1738 |
|
| 1739 |
# Authentication
|
|
|
|
|
|
|
| 1740 |
|
| 1741 |
api_key_header = APIKeyHeader(name="x-api-key", auto_error=False) # Changed to False for better error messages
|
| 1742 |
|
|
|
|
| 2150 |
|
| 2151 |
|
| 2152 |
# Exception handlers
|
|
|
|
| 2153 |
|
| 2154 |
def to_json_safe(value: Any) -> Any:
|
| 2155 |
"""Recursively convert values to JSON-safe primitives."""
|
model.py
CHANGED
|
@@ -5,6 +5,9 @@ Combines Wav2Vec2 deepfake detection with signal forensics.
|
|
| 5 |
import logging
|
| 6 |
import os
|
| 7 |
import numpy as np
|
|
|
|
|
|
|
|
|
|
| 8 |
from typing import Dict, Tuple, List, Optional
|
| 9 |
from dataclasses import dataclass
|
| 10 |
import warnings
|
|
@@ -57,7 +60,6 @@ def get_device():
|
|
| 57 |
"""Get the best available device (GPU or CPU)."""
|
| 58 |
global _device
|
| 59 |
if _device is None:
|
| 60 |
-
import torch
|
| 61 |
if torch.cuda.is_available():
|
| 62 |
_device = "cuda"
|
| 63 |
else:
|
|
@@ -136,8 +138,6 @@ def load_model():
|
|
| 136 |
|
| 137 |
def extract_signal_features(audio: np.ndarray, sr: int, fast_mode: bool = False) -> Dict[str, float]:
|
| 138 |
"""Extract signal-based features (pitch, entropy, silence)."""
|
| 139 |
-
import librosa
|
| 140 |
-
from scipy.stats import entropy
|
| 141 |
|
| 142 |
features = {}
|
| 143 |
|
|
@@ -475,9 +475,6 @@ def classify_with_model(audio: np.ndarray, sr: int) -> Tuple[str, float]:
|
|
| 475 |
Returns:
|
| 476 |
Tuple of (classification, confidence)
|
| 477 |
"""
|
| 478 |
-
import torch
|
| 479 |
-
import librosa
|
| 480 |
-
|
| 481 |
model, processor = load_model()
|
| 482 |
device = get_device()
|
| 483 |
|
|
|
|
| 5 |
import logging
|
| 6 |
import os
|
| 7 |
import numpy as np
|
| 8 |
+
import librosa
|
| 9 |
+
import torch
|
| 10 |
+
from scipy.stats import entropy
|
| 11 |
from typing import Dict, Tuple, List, Optional
|
| 12 |
from dataclasses import dataclass
|
| 13 |
import warnings
|
|
|
|
| 60 |
"""Get the best available device (GPU or CPU)."""
|
| 61 |
global _device
|
| 62 |
if _device is None:
|
|
|
|
| 63 |
if torch.cuda.is_available():
|
| 64 |
_device = "cuda"
|
| 65 |
else:
|
|
|
|
| 138 |
|
| 139 |
def extract_signal_features(audio: np.ndarray, sr: int, fast_mode: bool = False) -> Dict[str, float]:
|
| 140 |
"""Extract signal-based features (pitch, entropy, silence)."""
|
|
|
|
|
|
|
| 141 |
|
| 142 |
features = {}
|
| 143 |
|
|
|
|
| 475 |
Returns:
|
| 476 |
Tuple of (classification, confidence)
|
| 477 |
"""
|
|
|
|
|
|
|
|
|
|
| 478 |
model, processor = load_model()
|
| 479 |
device = get_device()
|
| 480 |
|
requirements.txt
CHANGED
|
@@ -8,16 +8,10 @@ scipy>=1.10.0
|
|
| 8 |
python-dotenv
|
| 9 |
pydantic>=2.0.0
|
| 10 |
transformers>=4.30.0
|
| 11 |
-
datasets>=2.14.0
|
| 12 |
-
scikit-learn>=1.3.0
|
| 13 |
-
accelerate>=0.20.0
|
| 14 |
slowapi>=0.1.9
|
| 15 |
pydantic-settings>=2.0.0
|
| 16 |
httpx>=0.27.0
|
| 17 |
-
# PyTorch - install manually for your platform if not using Docker:
|
| 18 |
-
# pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
|
| 19 |
-
torch>=2.0.0
|
| 20 |
-
torchaudio>=2.0.0
|
| 21 |
faster-whisper>=1.0.3
|
| 22 |
-
|
| 23 |
redis>=5.0.0
|
|
|
|
|
|
|
|
|
| 8 |
python-dotenv
|
| 9 |
pydantic>=2.0.0
|
| 10 |
transformers>=4.30.0
|
|
|
|
|
|
|
|
|
|
| 11 |
slowapi>=0.1.9
|
| 12 |
pydantic-settings>=2.0.0
|
| 13 |
httpx>=0.27.0
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
faster-whisper>=1.0.3
|
|
|
|
| 15 |
redis>=5.0.0
|
| 16 |
+
# PyTorch CPU — installed separately in Dockerfile for smaller image.
|
| 17 |
+
# For local dev: pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
|
run_final_tests.py
DELETED
|
@@ -1,44 +0,0 @@
|
|
| 1 |
-
"""Final hackathon test: all 5 files against legacy POST /api/voice-detection"""
|
| 2 |
-
import base64, json, time, requests
|
| 3 |
-
|
| 4 |
-
DIR = r"c:\Users\shiva\OneDrive\Desktop\Voice Project\voice-detection-api\drive-download-20260216T053632Z-1-001"
|
| 5 |
-
URL = "http://localhost:7860/api/voice-detection"
|
| 6 |
-
HEADERS = {"Content-Type": "application/json", "x-api-key": "sk_test_voice_detection_2026"}
|
| 7 |
-
|
| 8 |
-
FILES = [
|
| 9 |
-
("English_voice_AI_GENERATED.mp3", "English", "AI_GENERATED"),
|
| 10 |
-
("Hindi_Voice_HUMAN.mp3", "Hindi", "HUMAN"),
|
| 11 |
-
("Malayalam_AI_GENERATED.mp3", "Malayalam", "AI_GENERATED"),
|
| 12 |
-
("TAMIL_VOICE__HUMAN.mp3", "Tamil", "HUMAN"),
|
| 13 |
-
("Telugu_Voice_AI_GENERATED.mp3", "Telugu", "AI_GENERATED"),
|
| 14 |
-
]
|
| 15 |
-
|
| 16 |
-
print("=" * 90)
|
| 17 |
-
print(f"{'File':<42} {'Expected':<16} {'Got':<16} {'Conf':>6} Result")
|
| 18 |
-
print("=" * 90)
|
| 19 |
-
|
| 20 |
-
passed = 0
|
| 21 |
-
for fname, lang, expected in FILES:
|
| 22 |
-
with open(f"{DIR}\\{fname}", "rb") as f:
|
| 23 |
-
b64 = base64.b64encode(f.read()).decode()
|
| 24 |
-
payload = {"audioBase64": b64, "language": lang, "audioFormat": "mp3"}
|
| 25 |
-
t0 = time.time()
|
| 26 |
-
try:
|
| 27 |
-
r = requests.post(URL, json=payload, headers=HEADERS, timeout=30)
|
| 28 |
-
elapsed = time.time() - t0
|
| 29 |
-
d = r.json()
|
| 30 |
-
cls = d.get("classification", "?")
|
| 31 |
-
conf = d.get("confidenceScore", "?")
|
| 32 |
-
ok = cls == expected
|
| 33 |
-
if ok:
|
| 34 |
-
passed += 1
|
| 35 |
-
tag = "PASS" if ok else "FAIL"
|
| 36 |
-
print(f"{fname:<42} {expected:<16} {cls:<16} {conf:>6} {tag} ({elapsed:.1f}s)")
|
| 37 |
-
except Exception as e:
|
| 38 |
-
elapsed = time.time() - t0
|
| 39 |
-
print(f"{fname:<42} {expected:<16} {'ERROR':<16} {'--':>6} FAIL ({elapsed:.1f}s) {e}")
|
| 40 |
-
# small pause between requests to avoid CPU thermal throttle
|
| 41 |
-
time.sleep(2)
|
| 42 |
-
|
| 43 |
-
print("=" * 90)
|
| 44 |
-
print(f"Result: {passed}/{len(FILES)} passed")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
test_my_api.py
ADDED
|
@@ -0,0 +1,171 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Official evaluation script from the hackathon guide, configured with our 5 test files.
|
| 3 |
+
This mirrors EXACTLY what the evaluator will run.
|
| 4 |
+
"""
|
| 5 |
+
import requests
|
| 6 |
+
import base64
|
| 7 |
+
import json
|
| 8 |
+
|
| 9 |
+
def evaluate_voice_detection_api(endpoint_url, api_key, test_files):
|
| 10 |
+
if not endpoint_url:
|
| 11 |
+
print("Error: Endpoint URL is required")
|
| 12 |
+
return False
|
| 13 |
+
if not test_files or len(test_files) == 0:
|
| 14 |
+
print("Error: No test files provided")
|
| 15 |
+
return False
|
| 16 |
+
|
| 17 |
+
total_files = len(test_files)
|
| 18 |
+
score_per_file = 100 / total_files
|
| 19 |
+
total_score = 0
|
| 20 |
+
file_results = []
|
| 21 |
+
|
| 22 |
+
print(f"\n{'='*60}")
|
| 23 |
+
print(f"Starting Evaluation")
|
| 24 |
+
print(f"{'='*60}")
|
| 25 |
+
print(f"Endpoint: {endpoint_url}")
|
| 26 |
+
print(f"Total Test Files: {total_files}")
|
| 27 |
+
print(f"Score per File: {score_per_file:.2f}")
|
| 28 |
+
print(f"{'='*60}\n")
|
| 29 |
+
|
| 30 |
+
for idx, file_data in enumerate(test_files):
|
| 31 |
+
language = file_data.get('language', 'English')
|
| 32 |
+
file_path = file_data.get('file_path', '')
|
| 33 |
+
expected_classification = file_data.get('expected_classification', '')
|
| 34 |
+
|
| 35 |
+
print(f"Test {idx + 1}/{total_files}: {file_path}")
|
| 36 |
+
|
| 37 |
+
if not file_path or not expected_classification:
|
| 38 |
+
file_results.append({'fileIndex': idx, 'status': 'skipped', 'score': 0})
|
| 39 |
+
print(f" Skipped: Missing file path or expected classification\n")
|
| 40 |
+
continue
|
| 41 |
+
|
| 42 |
+
try:
|
| 43 |
+
with open(file_path, 'rb') as audio_file:
|
| 44 |
+
audio_base64 = base64.b64encode(audio_file.read()).decode('utf-8')
|
| 45 |
+
except Exception as e:
|
| 46 |
+
file_results.append({'fileIndex': idx, 'status': 'failed', 'message': f'Failed to read: {e}', 'score': 0})
|
| 47 |
+
print(f" Failed to read file: {e}\n")
|
| 48 |
+
continue
|
| 49 |
+
|
| 50 |
+
headers = {'Content-Type': 'application/json', 'x-api-key': api_key}
|
| 51 |
+
request_body = {'language': language, 'audioFormat': 'mp3', 'audioBase64': audio_base64}
|
| 52 |
+
|
| 53 |
+
try:
|
| 54 |
+
response = requests.post(endpoint_url, headers=headers, json=request_body, timeout=30)
|
| 55 |
+
|
| 56 |
+
if response.status_code != 200:
|
| 57 |
+
file_results.append({'fileIndex': idx, 'status': 'failed', 'message': f'HTTP {response.status_code}', 'score': 0})
|
| 58 |
+
print(f" HTTP Status: {response.status_code}")
|
| 59 |
+
print(f" Response: {response.text[:200]}\n")
|
| 60 |
+
continue
|
| 61 |
+
|
| 62 |
+
response_data = response.json()
|
| 63 |
+
|
| 64 |
+
if not isinstance(response_data, dict):
|
| 65 |
+
file_results.append({'fileIndex': idx, 'status': 'failed', 'message': 'Not a JSON object', 'score': 0})
|
| 66 |
+
print(f" Invalid response type\n")
|
| 67 |
+
continue
|
| 68 |
+
|
| 69 |
+
response_status = response_data.get('status', '')
|
| 70 |
+
response_classification = response_data.get('classification', '')
|
| 71 |
+
confidence_score = response_data.get('confidenceScore', None)
|
| 72 |
+
|
| 73 |
+
if not response_status or not response_classification or confidence_score is None:
|
| 74 |
+
file_results.append({'fileIndex': idx, 'status': 'failed', 'message': 'Missing required fields', 'score': 0})
|
| 75 |
+
print(f" Missing required fields")
|
| 76 |
+
print(f" Response: {json.dumps(response_data, indent=2)[:200]}\n")
|
| 77 |
+
continue
|
| 78 |
+
|
| 79 |
+
if response_status != 'success':
|
| 80 |
+
file_results.append({'fileIndex': idx, 'status': 'failed', 'message': f'Status: {response_status}', 'score': 0})
|
| 81 |
+
print(f" Status not 'success': {response_status}\n")
|
| 82 |
+
continue
|
| 83 |
+
|
| 84 |
+
if not isinstance(confidence_score, (int, float)) or confidence_score < 0 or confidence_score > 1:
|
| 85 |
+
file_results.append({'fileIndex': idx, 'status': 'failed', 'message': f'Invalid confidence: {confidence_score}', 'score': 0})
|
| 86 |
+
print(f" Invalid confidence score: {confidence_score}\n")
|
| 87 |
+
continue
|
| 88 |
+
|
| 89 |
+
valid_classifications = ['HUMAN', 'AI_GENERATED']
|
| 90 |
+
if response_classification not in valid_classifications:
|
| 91 |
+
file_results.append({'fileIndex': idx, 'status': 'failed', 'message': f'Invalid classification: {response_classification}', 'score': 0})
|
| 92 |
+
print(f" Invalid classification: {response_classification}\n")
|
| 93 |
+
continue
|
| 94 |
+
|
| 95 |
+
# Score calculation
|
| 96 |
+
file_score = 0
|
| 97 |
+
if response_classification == expected_classification:
|
| 98 |
+
if confidence_score >= 0.8:
|
| 99 |
+
file_score = score_per_file
|
| 100 |
+
confidence_tier = "100%"
|
| 101 |
+
elif confidence_score >= 0.6:
|
| 102 |
+
file_score = score_per_file * 0.75
|
| 103 |
+
confidence_tier = "75%"
|
| 104 |
+
elif confidence_score >= 0.4:
|
| 105 |
+
file_score = score_per_file * 0.5
|
| 106 |
+
confidence_tier = "50%"
|
| 107 |
+
else:
|
| 108 |
+
file_score = score_per_file * 0.25
|
| 109 |
+
confidence_tier = "25%"
|
| 110 |
+
total_score += file_score
|
| 111 |
+
file_results.append({'fileIndex': idx, 'status': 'success', 'matched': True, 'score': round(file_score, 2),
|
| 112 |
+
'actualClassification': response_classification, 'confidenceScore': confidence_score})
|
| 113 |
+
print(f" CORRECT: {response_classification}")
|
| 114 |
+
print(f" Confidence: {confidence_score:.2f} -> {confidence_tier} of points")
|
| 115 |
+
print(f" Score: {file_score:.2f}/{score_per_file:.2f}\n")
|
| 116 |
+
else:
|
| 117 |
+
file_results.append({'fileIndex': idx, 'status': 'success', 'matched': False, 'score': 0,
|
| 118 |
+
'actualClassification': response_classification, 'confidenceScore': confidence_score})
|
| 119 |
+
print(f" WRONG: {response_classification} (Expected: {expected_classification})")
|
| 120 |
+
print(f" Score: 0/{score_per_file:.2f}\n")
|
| 121 |
+
|
| 122 |
+
except requests.exceptions.Timeout:
|
| 123 |
+
file_results.append({'fileIndex': idx, 'status': 'failed', 'message': 'Timeout (>30s)', 'score': 0})
|
| 124 |
+
print(f" TIMEOUT: Request took longer than 30 seconds\n")
|
| 125 |
+
except requests.exceptions.ConnectionError:
|
| 126 |
+
file_results.append({'fileIndex': idx, 'status': 'failed', 'message': 'Connection error', 'score': 0})
|
| 127 |
+
print(f" CONNECTION ERROR\n")
|
| 128 |
+
except Exception as e:
|
| 129 |
+
file_results.append({'fileIndex': idx, 'status': 'failed', 'message': str(e), 'score': 0})
|
| 130 |
+
print(f" ERROR: {e}\n")
|
| 131 |
+
|
| 132 |
+
final_score = round(total_score)
|
| 133 |
+
|
| 134 |
+
print(f"{'='*60}")
|
| 135 |
+
print(f"EVALUATION SUMMARY")
|
| 136 |
+
print(f"{'='*60}")
|
| 137 |
+
print(f"Total Files Tested: {total_files}")
|
| 138 |
+
print(f"Final Score: {final_score}/100")
|
| 139 |
+
print(f"{'='*60}\n")
|
| 140 |
+
|
| 141 |
+
successful = sum(1 for r in file_results if r.get('matched', False))
|
| 142 |
+
failed = sum(1 for r in file_results if r['status'] == 'failed')
|
| 143 |
+
wrong = sum(1 for r in file_results if r['status'] == 'success' and not r.get('matched', False))
|
| 144 |
+
|
| 145 |
+
print(f"Correct Classifications: {successful}/{total_files}")
|
| 146 |
+
print(f"Wrong Classifications: {wrong}/{total_files}")
|
| 147 |
+
print(f"Failed/Errors: {failed}/{total_files}\n")
|
| 148 |
+
|
| 149 |
+
with open('evaluation_results.json', 'w') as f:
|
| 150 |
+
json.dump({'finalScore': final_score, 'totalFiles': total_files, 'scorePerFile': round(score_per_file, 2),
|
| 151 |
+
'successfulClassifications': successful, 'wrongClassifications': wrong, 'failedTests': failed,
|
| 152 |
+
'fileResults': file_results}, f, indent=2)
|
| 153 |
+
print(f"Detailed results saved to: evaluation_results.json\n")
|
| 154 |
+
return True
|
| 155 |
+
|
| 156 |
+
|
| 157 |
+
if __name__ == '__main__':
|
| 158 |
+
ENDPOINT_URL = 'https://shivam-2211-voice-detection-api.hf.space/api/voice-detection'
|
| 159 |
+
API_KEY = 'sk_test_voice_detection_2026'
|
| 160 |
+
|
| 161 |
+
DIR = r'c:\Users\shiva\OneDrive\Desktop\Voice Project\voice-detection-api\drive-download-20260216T053632Z-1-001'
|
| 162 |
+
|
| 163 |
+
TEST_FILES = [
|
| 164 |
+
{'language': 'English', 'file_path': f'{DIR}\\English_voice_AI_GENERATED.mp3', 'expected_classification': 'AI_GENERATED'},
|
| 165 |
+
{'language': 'Hindi', 'file_path': f'{DIR}\\Hindi_Voice_HUMAN.mp3', 'expected_classification': 'HUMAN'},
|
| 166 |
+
{'language': 'Malayalam','file_path': f'{DIR}\\Malayalam_AI_GENERATED.mp3', 'expected_classification': 'AI_GENERATED'},
|
| 167 |
+
{'language': 'Tamil', 'file_path': f'{DIR}\\TAMIL_VOICE__HUMAN.mp3', 'expected_classification': 'HUMAN'},
|
| 168 |
+
{'language': 'Telugu', 'file_path': f'{DIR}\\Telugu_Voice_AI_GENERATED.mp3', 'expected_classification': 'AI_GENERATED'},
|
| 169 |
+
]
|
| 170 |
+
|
| 171 |
+
evaluate_voice_detection_api(ENDPOINT_URL, API_KEY, TEST_FILES)
|