Spaces:

arshan123
/

vnitx-audio

Sleeping

App Files Files Community

arshan123 commited on Feb 8

Commit

1cb0653

0 Parent(s):

Added AI_Voice_Detector

Browse files

Files changed (17) hide show

.dockerimage +24 -0
.gitignore +80 -0
DockerFile +46 -0
README.md +488 -0
app.py +1053 -0
client.py +209 -0
detector.py +875 -0
download_models.py +92 -0
pytest.ini +4 -0
requirements.txt +14 -0
self_learning_train.py +245 -0
tests/conftest.py +144 -0
tests/test_api.py +177 -0
tests/test_feedback.py +67 -0
tests/test_integration_model.py +60 -0
tests/test_streaming.py +92 -0
try.ipynb +0 -0

.dockerimage ADDED Viewed

	@@ -0,0 +1,24 @@

+__pycache__
+*.pyc
+*.pyo
+*.pyd
+.Python
+*.so
+*.egg
+*.egg-info
+dist
+build
+.pytest_cache
+.coverage
+htmlcov
+.env.local
+.DS_Store
+*.log
+test_audio/
+logs/
+*.md
+.git
+.gitignore
+docker-compose.yml
+test_api.py
+client.py

.gitignore ADDED Viewed

	@@ -0,0 +1,80 @@

+AI_voice_dataset/
+Deepfake-audio-detection-V2/
+wav2vec2_finetuned_model/
+wav2vec2-deepfake-voice-detector/
+trained_voice_features.csv
+voice_auth_model.pkl
+.env
+test.py
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual Environment
+venv/
+ENV/
+env/
+.venv
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+.tox/
+# Environment
+.env
+.env.local
+.env.*.local
+# Logs
+*.log
+logs/
+# OS
+.DS_Store
+Thumbs.db
+# Audio files (for testing)
+test_audio/
+*.mp3
+*.wav
+# Self-learning data
+data/
+# Docker
+.dockerignore
+# Temporary files
+*.tmp
+temp/
+tmp/

DockerFile ADDED Viewed

	@@ -0,0 +1,46 @@

+FROM python:3.10-slim
+# Set up a new user named "user" with user ID 1000 (required by HuggingFace Spaces)
+RUN useradd -m -u 1000 user
+# Set working directory
+WORKDIR /app
+# System dependencies for audio processing
+RUN apt-get update && apt-get install -y \
+    libsndfile1 \
+    ffmpeg \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements and install as root
+COPY --chown=user requirements.txt /app/requirements.txt
+RUN pip install --no-cache-dir --upgrade pip && \
+    pip install --no-cache-dir -r requirements.txt
+# Copy application files with correct ownership
+COPY --chown=user app.py /app/
+COPY --chown=user detector.py /app/
+COPY --chown=user self_learning_train.py /app/
+# Switch to the "user" user
+USER user
+# Set home to the user's home directory
+ENV HOME=/home/user \
+    PATH=/home/user/.local/bin:$PATH \
+    PYTHONUNBUFFERED=1
+# Pre-download models (will be cached in user's home)
+RUN python -c "from transformers import AutoModelForAudioClassification, AutoFeatureExtractor, WhisperProcessor, WhisperForConditionalGeneration; \
+    print('Downloading models...'); \
+    AutoModelForAudioClassification.from_pretrained('garystafford/wav2vec2-deepfake-voice-detector'); \
+    AutoFeatureExtractor.from_pretrained('garystafford/wav2vec2-deepfake-voice-detector'); \
+    WhisperProcessor.from_pretrained('openai/whisper-base'); \
+    WhisperForConditionalGeneration.from_pretrained('openai/whisper-base'); \
+    print('Models downloaded successfully')"
+# Expose HuggingFace Spaces port
+EXPOSE 7860
+# Run with uvicorn (FastAPI)
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md ADDED Viewed

	@@ -0,0 +1,488 @@

+# 🎙️ Voice Detection API
+A production-ready REST API that detects whether a voice recording is AI-generated or human using hybrid analysis (physics-based + deep learning).
+## 🌟 Features
+- ✅ **Multi-language Support**: Tamil, English, Hindi, Malayalam, Telugu
+- ✅ **Hybrid Detection**: Combines physics analysis + Wav2Vec2 deepfake detection
+- ✅ **Language Detection**: Automatic language identification using Whisper
+- ✅ **Secure**: API key authentication
+- ✅ **Fast**: Auto-truncates to 30 seconds for quick processing
+- ✅ **Production Ready**: Docker support, logging, health checks
+- ✅ **Realtime Streaming**: WebSocket streaming with partial results
+- ✅ **Self-Learning Ready**: Feedback collection + calibration training
+## 📁 Project Structure
+```
+voice-detection-api/
+├── app.py                 # Flask API application
+├── detector.py            # Your HybridEnsembleDetector class
+├── self_learning_train.py # Calibration training from feedback data
+├── client.py              # Example Python client
+├── test_api.py            # Automated test suite
+├── requirements.txt       # Python dependencies
+├── Dockerfile             # Docker configuration
+├── docker-compose.yml     # Docker Compose setup
+├── .env                   # Environment variables
+├── DEPLOYMENT.md          # Detailed deployment guide
+└── README.md             # This file
+```
+## 🚀 Quick Start
+### Prerequisites
+- Python 3.10+
+- pip
+- (Optional) Docker & Docker Compose
+### Installation
+1. **Clone the repository**
+```bash
+git clone <your-repo-url>
+cd voice-detection-api
+```
+2. **Install dependencies**
+```bash
+pip install -r requirements.txt
+```
+3. **Set up environment variables**
+```bash
+# Copy the example .env file
+cp .env.example .env
+# Edit .env and set your API key
+nano .env
+```
+4. **Run the API**
+```bash
+python app.py
+```
+The API will start at `http://localhost:5000`
+## 🐳 Docker Deployment (Recommended)
+### Quick Start with Docker Compose
+```bash
+# Start the API
+docker-compose up -d
+# Check status
+docker-compose ps
+# View logs
+docker-compose logs -f
+# Stop the API
+docker-compose down
+```
+### Manual Docker Build
+```bash
+# Build image
+docker build -t voice-detection-api .
+# Run container
+docker run -p 5000:5000 \
+  -e API_KEY="your_secret_key" \
+  voice-detection-api
+```
+## 📡 API Usage
+### Health Check
+```bash
+curl http://localhost:5000/health
+```
+### Voice Detection
+**Using cURL:**
+```bash
+curl -X POST http://localhost:5000/api/voice-detection \
+  -H "Content-Type: application/json" \
+  -H "x-api-key: sk_test_123456789" \
+  -d '{
+    "language": "English",
+    "audioFormat": "mp3",
+    "audioBase64": "'"$(base64 -w 0 your_audio.mp3)"'"
+  }'
+```
+**Using Python Client:**
+```bash
+# Single file
+python client.py --audio test_audio.mp3 --language English
+# Multiple files
+python client.py \
+  --audio file1.mp3 \
+  --audio file2.mp3 \
+  --language Tamil
+```
+**Using Python Requests:**
+```python
+import requests
+import base64
+# Encode audio
+with open('audio.mp3', 'rb') as f:
+    audio_base64 = base64.b64encode(f.read()).decode()
+# Make request
+response = requests.post(
+    'http://localhost:5000/api/voice-detection',
+    headers={
+        'Content-Type': 'application/json',
+        'x-api-key': 'sk_test_123456789'
+    },
+    json={
+        'language': 'English',
+        'audioFormat': 'mp3',
+        'audioBase64': audio_base64
+    }
+)
+result = response.json()
+print(f"Classification: {result['classification']}")
+print(f"Confidence: {result['confidenceScore']}")
+```
+### Realtime Streaming (WebSocket)
+Endpoint: `ws://localhost:5000/ws/voice-stream`
+Authentication:
+- Query param: `?api_key=sk_test_123456789`
+- Or header: `x-api-key` (non-browser clients)
+Recommended streaming format: `pcm16` (16kHz, mono). This allows partial
+results while the audio is still streaming.
+If you stream `mp3` or `wav`, partial results are disabled and analysis runs
+on the final buffer.
+**Client -> Server messages:**
+```json
+{ "type": "start", "audioFormat": "pcm16", "sampleRate": 16000, "channels": 1,
+  "enablePartial": true, "partialIntervalSec": 10 }
+```
+```json
+{ "type": "audio_chunk", "audioChunkBase64": "<base64_pcm_chunk>" }
+```
+```json
+{ "type": "audio_chunk", "audioChunkBase64": "<base64_pcm_chunk>", "final": true }
+```
+**Server -> Client messages:**
+```json
+{ "type": "ack", "sessionId": "...", "status": "ready" }
+```
+```json
+{ "type": "progress", "receivedBytes": 12345, "bufferBytes": 12345, "bufferSeconds": 2.1 }
+```
+```json
+{ "type": "partial_result", "result": { "status": "success", "classification": "AI_GENERATED" } }
+```
+```json
+{ "type": "final_result", "result": { "status": "success", "classification": "HUMAN" } }
+```
+**Browser example:**
+```javascript
+const ws = new WebSocket("ws://localhost:5000/ws/voice-stream?api_key=sk_test_123456789");
+ws.onopen = () => {
+  ws.send(JSON.stringify({
+    type: "start",
+    audioFormat: "pcm16",
+    sampleRate: 16000,
+    channels: 1,
+    enablePartial: true
+  }));
+  // Send base64-encoded PCM16 chunks as they arrive
+  ws.send(JSON.stringify({ type: "audio_chunk", audioChunkBase64: chunkBase64 }));
+  ws.send(JSON.stringify({ type: "audio_chunk", audioChunkBase64: lastChunkBase64, final: true }));
+};
+ws.onmessage = (event) => console.log(event.data);
+```
+### Feedback (Self-Learning)
+Send labeled audio samples so the model can periodically recalibrate.
+```bash
+curl -X POST http://localhost:5000/api/feedback \
+  -H "Content-Type: application/json" \
+  -H "x-api-key: sk_test_123456789" \
+  -d '{
+    "label": "AI_GENERATED",
+    "audioFormat": "mp3",
+    "audioBase64": "'"$(base64 -w 0 new_ai_sample.mp3)"'"
+  }'
+```
+Stored samples are written to `data/feedback/<LABEL>/YYYYMMDD/` along with
+metadata JSON files and an index.
+### Train Calibration (Self-Learning)
+This trains a lightweight calibration layer using feedback samples:
+```bash
+python self_learning_train.py --data-dir data/feedback --output data/calibration.json
+```
+If `CALIBRATION_PATH` exists, the API loads it on startup.
+When retraining, the script will automatically archive the previous calibration
+to `CALIBRATION_HISTORY_DIR` before writing the new file.
+Reload calibration without restarting the API:
+```bash
+curl -X POST http://localhost:5000/api/reload-calibration \
+  -H "x-api-key: sk_test_123456789"
+```
+Backup the current calibration (creates a timestamped copy):
+```bash
+curl -X POST http://localhost:5000/api/backup-calibration \
+  -H "x-api-key: sk_test_123456789" \
+  -d '{"reason": "pre_retrain"}'
+```
+List calibration history:
+```bash
+curl -X GET http://localhost:5000/api/calibration-history \
+  -H "x-api-key: sk_test_123456789"
+```
+Rollback to a previous calibration:
+```bash
+curl -X POST http://localhost:5000/api/rollback-calibration \
+  -H "x-api-key: sk_test_123456789" \
+  -d '{"versionId": "20260207T120000Z_ab12cd34"}'
+```
+## 📊 Response Format
+### Success Response
+```json
+{
+  "status": "success",
+  "language": "English",
+  "classification": "AI_GENERATED",
+  "confidenceScore": 0.91,
+  "explanation": "Deep learning model detected synthetic voice patterns (confidence: 92.5%)"
+}
+```
+### Error Response
+```json
+{
+  "status": "error",
+  "message": "Invalid API key"
+}
+```
+## 🔑 Authentication
+All requests to `/api/voice-detection` require an API key in the header:
+```
+x-api-key: your_api_key_here
+```
+**Setting API Key:**
+```bash
+# In .env file
+API_KEY=sk_test_123456789
+# Or as environment variable
+export API_KEY="your_secure_key"
+```
+## 🧪 Testing
+### Run Test Suite
+```bash
+pytest
+```
+### Integration Tests (full model)
+```bash
+RUN_MODEL_TESTS=true pytest -m integration
+```
+Set `AI_MISS_AUDIO_PATH` to point at a known false-negative AI sample to
+track improvements after recalibration.
+### Manual Testing
+```bash
+# Health check
+curl http://localhost:5000/health
+# Test with sample audio
+python client.py --audio test_audio.mp3
+```
+## 📝 Supported Features
+### Languages
+- Tamil
+- English
+- Hindi
+- Malayalam
+- Telugu
+### Classifications
+- `AI_GENERATED` - Synthetic/AI voice
+- `HUMAN` - Real human voice
+### Audio Requirements
+- Format: MP3 only
+- Input: Base64 encoded
+- Max duration: 30 seconds (auto-truncated)
+## ⚙️ Configuration
+### Environment Variables
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `API_KEY` | `sk_test_123456789` | API authentication key |
+| `PORT` | `5000` | Server port |
+| `FLASK_ENV` | `production` | Flask environment |
+| `ENABLE_STREAMING` | `true` | Enable WebSocket streaming endpoint |
+| `STREAMING_MAX_BUFFER_SECONDS` | `30` | Max audio seconds buffered for streaming |
+| `STREAMING_PARTIAL_INTERVAL_SECONDS` | `10` | Partial result interval for streaming |
+| `STREAMING_PARTIAL_MODE` | `physics` | Partial mode: `full`, `physics`, or `dl` |
+| `STREAMING_MAX_CHUNK_BYTES` | `2097152` | Max size per streaming chunk |
+| `ENABLE_FEEDBACK_STORAGE` | `true` | Enable feedback storage for self-learning |
+| `FEEDBACK_STORAGE_DIR` | `data/feedback` | Feedback storage directory |
+| `FEEDBACK_MAX_BYTES` | `15728640` | Max feedback payload size |
+| `CALIBRATION_PATH` | `data/calibration.json` | Calibration file path |
+| `SKIP_MODEL_LOAD` | `false` | Skip loading models at startup (useful for tests) |
+| `CALIBRATION_HISTORY_DIR` | `data/calibration_history` | Calibration backup directory |
+| `CALIBRATION_HISTORY_MAX` | `50` | Max calibration backups retained |
+### Model Configuration
+Edit the detector initialization in `app.py`:
+```python
+detector = HybridEnsembleDetector(
+    physics_weight=0.4,        # Physics model weight
+    dl_weight=0.6,             # Deep learning weight
+    max_audio_duration=30      # Max seconds to process
+)
+```
+## 🏗️ Architecture
+### Detection Pipeline
+1. **Audio Input** → Base64 MP3
+2. **Preprocessing** → Decode, convert to 16kHz mono
+3. **Language Detection** → Whisper model identifies language
+4. **Physics Analysis** → Acoustic feature extraction
+5. **Deep Learning** → Wav2Vec2 deepfake detection
+6. **Ensemble** → Weighted combination of scores
+7. **Classification** → AI_GENERATED or HUMAN
+### Models Used
+- **Deepfake Detector**: `garystafford/wav2vec2-deepfake-voice-detector`
+- **Language Detector**: `openai/whisper-base`
+## 📈 Performance
+- **Processing Time**: 2-10 seconds per audio
+- **Memory**: ~2GB RAM minimum
+- **Accuracy**: Varies by language and audio quality
+- **Throughput**: ~5-10 requests/minute per worker
+## 🔧 Troubleshooting
+### Models Not Loading
+```bash
+# Pre-download models
+python -c "from transformers import AutoModelForAudioClassification; \
+  AutoModelForAudioClassification.from_pretrained('garystafford/wav2vec2-deepfake-voice-detector')"
+```
+### Port Already in Use
+```bash
+# Change port in .env
+PORT=8000
+# Or use environment variable
+PORT=8000 python app.py
+```
+### Memory Issues
+- Reduce `max_audio_duration` to 15 seconds
+- Use fewer Docker workers
+- Increase system RAM
+## 📖 Documentation
+- **Full Deployment Guide**: See [DEPLOYMENT.md](DEPLOYMENT.md)
+- **API Reference**: See API section above
+- **Model Details**: See `detector.py` comments
+## 🛡️ Security Notes
+- Never commit API keys to version control
+- Use strong, random API keys in production
+- Enable HTTPS/TLS for production deployments
+- Implement rate limiting for production use
+- Regularly update dependencies
+## 🚀 Production Deployment
+### Using Gunicorn
+```bash
+gunicorn --bind 0.0.0.0:5000 --workers 2 --timeout 120 app:app
+```
+### With Nginx Reverse Proxy
+See [DEPLOYMENT.md](DEPLOYMENT.md) for Nginx configuration
+### Cloud Platforms
+- AWS: EC2 + Docker or Elastic Beanstalk
+- Google Cloud: Cloud Run or Compute Engine
+- Azure: App Service or Container Instances
+- Heroku: Supports Python + Docker
+## 📞 Support
+For issues or questions:
+1. Check [DEPLOYMENT.md](DEPLOYMENT.md)
+2. Run test suite: `python test_api.py`
+3. Check logs: `docker-compose logs`
+## 📄 License
+This project uses open-source models:
+- Wav2Vec2: Apache 2.0
+- Whisper: MIT
+## 🙏 Credits
+- **Models**: HuggingFace transformers
+- **Framework**: Flask
+- **Audio Processing**: Librosa, SoundFile
+---
+**Version**: 1.0.0
+**Status**: Production Ready ✅
+**Last Updated**: February 2026

app.py ADDED Viewed

	@@ -0,0 +1,1053 @@

+"""
+Voice Detection API - Flask Application (HuggingFace Spaces Version)
+Accepts Base64-encoded MP3 audio and returns AI vs Human classification
+"""
+from flask import Flask, request, jsonify
+from flask_cors import CORS
+from flask_sock import Sock
+from functools import wraps
+import base64
+import json
+import os
+import logging
+import shutil
+import tempfile
+import uuid
+import wave
+from datetime import datetime
+from urllib.parse import parse_qs
+# Import the detector
+from detector import HybridEnsembleDetector
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+# Initialize Flask app
+app = Flask(__name__)
+CORS(app)
+sock = Sock(app)
+# Load API key from environment variable (HuggingFace Secrets)
+API_KEY = os.environ.get('API_KEY', 'sk_test_123456789')
+logger.info(f"API initialized with key: {API_KEY[:10]}...")
+def parse_bool(value, default=False):
+    if value is None:
+        return default
+    if isinstance(value, bool):
+        return value
+    return str(value).strip().lower() in ["1", "true", "yes", "y", "on"]
+# Streaming configuration
+STREAMING_ENABLED = parse_bool(os.environ.get("ENABLE_STREAMING", "true"))
+STREAMING_MAX_BUFFER_SECONDS = int(os.environ.get("STREAMING_MAX_BUFFER_SECONDS", 30))
+STREAMING_PARTIAL_INTERVAL_SECONDS = float(os.environ.get("STREAMING_PARTIAL_INTERVAL_SECONDS", 10))
+STREAMING_PARTIAL_MODE = os.environ.get("STREAMING_PARTIAL_MODE", "physics").lower()
+STREAMING_MAX_CHUNK_BYTES = int(os.environ.get("STREAMING_MAX_CHUNK_BYTES", 2 * 1024 * 1024))
+STREAMING_SUPPORTED_FORMATS = {"pcm16", "wav", "mp3"}
+# Self-learning / feedback configuration
+ENABLE_FEEDBACK_STORAGE = parse_bool(os.environ.get("ENABLE_FEEDBACK_STORAGE", "true"))
+FEEDBACK_STORAGE_DIR = os.environ.get("FEEDBACK_STORAGE_DIR", "data/feedback")
+FEEDBACK_MAX_BYTES = int(os.environ.get("FEEDBACK_MAX_BYTES", 15 * 1024 * 1024))
+CALIBRATION_PATH = os.environ.get("CALIBRATION_PATH", "data/calibration.json")
+CALIBRATION_HISTORY_DIR = os.environ.get("CALIBRATION_HISTORY_DIR", "data/calibration_history")
+CALIBRATION_HISTORY_MAX = int(os.environ.get("CALIBRATION_HISTORY_MAX", 50))
+# Initialize the detector globally (load models once at startup)
+logger.info("Loading AI detection models...")
+detector = None
+SKIP_MODEL_LOAD = parse_bool(os.environ.get("SKIP_MODEL_LOAD", "false"))
+def init_detector():
+    """Initialize the detector with models"""
+    global detector
+    try:
+        detector = HybridEnsembleDetector(
+            deepfake_model_path=r"D:\hackathons\GUVI_HCL\AI_Voice_Detector\wav2vec2-deepfake-voice-detector",
+            whisper_model_path="openai/whisper-base",
+            physics_weight=0.4,
+            dl_weight=0.6,
+            use_local_deepfake_model=True,
+            use_local_whisper_model=False,
+            calibration_path=CALIBRATION_PATH,
+            max_audio_duration=30
+        )
+        logger.info("✅ Detector initialized successfully")
+        return True
+    except Exception as e:
+        logger.error(f"❌ Failed to initialize detector: {str(e)}")
+        return False
+# Initialize detector at startup
+if SKIP_MODEL_LOAD:
+    logger.info("⚠️ Skipping detector initialization (SKIP_MODEL_LOAD=true)")
+elif not init_detector():
+    logger.warning("⚠️ API starting without detector - models will be loaded on first request")
+# ==========================================================
+# AUTHENTICATION DECORATOR
+# ==========================================================
+def require_api_key(f):
+    """Decorator to validate API key from request headers"""
+    @wraps(f)
+    def decorated_function(*args, **kwargs):
+        # Get API key from headers
+        provided_key = request.headers.get('x-api-key')
+        if not provided_key:
+            logger.warning(f"Request without API key from {request.remote_addr}")
+            return jsonify({
+                "status": "error",
+                "message": "Missing API key. Please provide 'x-api-key' in request headers."
+            }), 401
+        if provided_key != API_KEY:
+            logger.warning(f"Invalid API key attempt from {request.remote_addr}")
+            return jsonify({
+                "status": "error",
+                "message": "Invalid API key"
+            }), 403
+        return f(*args, **kwargs)
+    return decorated_function
+def get_ws_api_key(environ):
+    if not environ:
+        return None
+    key = environ.get("HTTP_X_API_KEY")
+    if key:
+        return key
+    auth = environ.get("HTTP_AUTHORIZATION")
+    if auth and auth.lower().startswith("bearer "):
+        return auth.split(" ", 1)[1]
+    query_params = parse_qs(environ.get("QUERY_STRING", ""))
+    if "api_key" in query_params:
+        return query_params["api_key"][0]
+    return None
+def normalize_label(label):
+    if label is None:
+        return None
+    label_value = str(label).strip().upper()
+    if label_value in ["AI_GENERATED", "AI", "FAKE", "SYNTHETIC"]:
+        return "AI_GENERATED"
+    if label_value in ["HUMAN", "REAL"]:
+        return "HUMAN"
+    return None
+def decode_audio_base64(audio_base64):
+    detected_format = None
+    if isinstance(audio_base64, str) and audio_base64.startswith("data:"):
+        header, audio_base64 = audio_base64.split(",", 1)
+        header_lower = header.lower()
+        if "audio/wav" in header_lower or "audio/x-wav" in header_lower:
+            detected_format = "wav"
+        elif "audio/mpeg" in header_lower or "audio/mp3" in header_lower:
+            detected_format = "mp3"
+    audio_bytes = base64.b64decode(audio_base64)
+    return audio_bytes, detected_format
+def write_bytes_to_temp_file(data, suffix):
+    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=suffix)
+    temp_file.write(data)
+    temp_file.close()
+    return temp_file.name
+def write_pcm16_to_wav_file(pcm_bytes, sample_rate, channels):
+    if len(pcm_bytes) % 2 != 0:
+        pcm_bytes = pcm_bytes[:len(pcm_bytes) - 1]
+    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
+    temp_path = temp_file.name
+    temp_file.close()
+    with wave.open(temp_path, "wb") as wav_file:
+        wav_file.setnchannels(channels)
+        wav_file.setsampwidth(2)
+        wav_file.setframerate(sample_rate)
+        wav_file.writeframes(pcm_bytes)
+    return temp_path
+def format_detection_payload(result, requested_language=None):
+    if result.get("status") != "success":
+        return {
+            "status": "error",
+            "message": result.get("error") or result.get("message") or "Unknown error"
+        }
+    payload = {
+        "status": "success",
+        "classification": result.get("classification"),
+        "confidenceScore": result.get("confidenceScore"),
+        "explanation": result.get("explanation"),
+        "detectedLanguage": result.get("language", "Unknown"),
+        "analysisMode": result.get("analysisMode", "full")
+    }
+    if requested_language:
+        payload["requestedLanguage"] = requested_language
+    return payload
+def ensure_dir(path):
+    if path:
+        os.makedirs(path, exist_ok=True)
+def build_calibration_version_id():
+    timestamp = datetime.utcnow().strftime("%Y%m%dT%H%M%SZ")
+    suffix = uuid.uuid4().hex[:8]
+    return f"{timestamp}_{suffix}"
+def calibration_history_files():
+    if not os.path.isdir(CALIBRATION_HISTORY_DIR):
+        return []
+    files = []
+    for name in os.listdir(CALIBRATION_HISTORY_DIR):
+        if name.startswith("calibration_") and name.endswith(".json"):
+            if name.endswith(".meta.json"):
+                continue
+            files.append(os.path.join(CALIBRATION_HISTORY_DIR, name))
+    files.sort(key=lambda p: os.path.getmtime(p), reverse=True)
+    return files
+def archive_calibration(reason=None):
+    if not os.path.exists(CALIBRATION_PATH):
+        return None
+    ensure_dir(CALIBRATION_HISTORY_DIR)
+    version_id = build_calibration_version_id()
+    filename = f"calibration_{version_id}.json"
+    dest_path = os.path.join(CALIBRATION_HISTORY_DIR, filename)
+    shutil.copy2(CALIBRATION_PATH, dest_path)
+    meta = {
+        "versionId": version_id,
+        "source": CALIBRATION_PATH,
+        "archivedAt": datetime.utcnow().isoformat() + "Z",
+        "reason": reason or "manual"
+    }
+    meta_path = os.path.join(CALIBRATION_HISTORY_DIR, f"calibration_{version_id}.meta.json")
+    with open(meta_path, "w", encoding="utf-8") as handle:
+        json.dump(meta, handle, indent=2)
+    if CALIBRATION_HISTORY_MAX > 0:
+        history = calibration_history_files()
+        for path in history[CALIBRATION_HISTORY_MAX:]:
+            try:
+                os.unlink(path)
+            except Exception:
+                pass
+            meta_path = path.replace(".json", ".meta.json")
+            if os.path.exists(meta_path):
+                try:
+                    os.unlink(meta_path)
+                except Exception:
+                    pass
+    return {
+        "versionId": version_id,
+        "path": dest_path
+    }
+def list_calibration_history():
+    entries = []
+    for path in calibration_history_files():
+        name = os.path.basename(path)
+        version_id = name.replace("calibration_", "").replace(".json", "")
+        meta_path = path.replace(".json", ".meta.json")
+        meta = {}
+        if os.path.exists(meta_path):
+            try:
+                with open(meta_path, "r", encoding="utf-8") as handle:
+                    meta = json.load(handle)
+            except Exception:
+                meta = {}
+        entries.append({
+            "versionId": version_id,
+            "path": path,
+            "archivedAt": meta.get("archivedAt"),
+            "reason": meta.get("reason")
+        })
+    return entries
+def resolve_history_path(version_id):
+    if not version_id:
+        return None
+    filename = f"calibration_{version_id}.json"
+    return os.path.join(CALIBRATION_HISTORY_DIR, filename)
+class StreamSession:
+    def __init__(
+        self,
+        audio_format,
+        sample_rate,
+        channels,
+        max_seconds,
+        enable_partial,
+        partial_interval_seconds,
+        partial_mode
+    ):
+        self.session_id = str(uuid.uuid4())
+        self.audio_format = audio_format
+        self.sample_rate = sample_rate
+        self.channels = channels
+        self.max_seconds = max_seconds
+        self.enable_partial = enable_partial
+        self.partial_interval_seconds = partial_interval_seconds
+        self.partial_mode = partial_mode
+        self.buffer = bytearray()
+        self.total_bytes_received = 0
+        self.total_seconds_received = 0.0
+        self.last_partial_seconds = 0.0
+    def add_chunk(self, chunk_bytes):
+        self.total_bytes_received += len(chunk_bytes)
+        self.buffer.extend(chunk_bytes)
+        if self.audio_format == "pcm16":
+            bytes_per_second = self.sample_rate * self.channels * 2
+            if bytes_per_second > 0:
+                self.total_seconds_received = self.total_bytes_received / bytes_per_second
+                max_bytes = int(self.max_seconds * bytes_per_second)
+                if max_bytes > 0 and len(self.buffer) > max_bytes:
+                    overflow = len(self.buffer) - max_bytes
+                    del self.buffer[:overflow]
+        return self.current_buffer_seconds()
+    def current_buffer_seconds(self):
+        if self.audio_format != "pcm16":
+            return None
+        bytes_per_second = self.sample_rate * self.channels * 2
+        if bytes_per_second <= 0:
+            return None
+        return len(self.buffer) / bytes_per_second
+    def should_run_partial(self):
+        if not self.enable_partial:
+            return False
+        if self.audio_format != "pcm16":
+            return False
+        if self.partial_interval_seconds <= 0:
+            return False
+        if (self.total_seconds_received - self.last_partial_seconds) >= self.partial_interval_seconds:
+            self.last_partial_seconds = self.total_seconds_received
+            return True
+        return False
+    def write_temp_audio_file(self):
+        if self.audio_format == "pcm16":
+            return write_pcm16_to_wav_file(self.buffer, self.sample_rate, self.channels), "wav"
+        suffix = ".mp3" if self.audio_format == "mp3" else ".wav"
+        return write_bytes_to_temp_file(self.buffer, suffix), self.audio_format
+# ==========================================================
+# ROOT ENDPOINT (HuggingFace Spaces Homepage)
+# ==========================================================
+@app.route('/', methods=['GET'])
+def home():
+    """Root endpoint - API information"""
+    return jsonify({
+        "name": "Voice Detection API",
+        "version": "1.0.0",
+        "description": "AI-powered voice detection system for identifying AI-generated vs human voices",
+        "endpoints": {
+            "health": "/health",
+            "detection": "/api/voice-detection",
+            "streaming": "/ws/voice-stream",
+            "feedback": "/api/feedback",
+            "reload_calibration": "/api/reload-calibration",
+            "backup_calibration": "/api/backup-calibration",
+            "rollback_calibration": "/api/rollback-calibration",
+            "calibration_history": "/api/calibration-history"
+        },
+        "supported_languages": ["Tamil", "English", "Hindi", "Malayalam", "Telugu"],
+        "authentication": "Required - use 'x-api-key' header",
+        "documentation": "See README for full API documentation"
+    }), 200
+# ==========================================================
+# HEALTH CHECK ENDPOINT
+# ==========================================================
+@app.route('/health', methods=['GET'])
+def health_check():
+    """Health check endpoint (no authentication required)"""
+    return jsonify({
+        "status": "healthy",
+        "service": "Voice Detection API",
+        "timestamp": datetime.utcnow().isoformat(),
+        "models_loaded": detector is not None,
+        "calibration_loaded": bool(detector and detector.calibrator and detector.calibrator.ready),
+        "streaming_enabled": STREAMING_ENABLED,
+        "platform": "HuggingFace Spaces"
+    }), 200
+# ==========================================================
+# MAIN VOICE DETECTION ENDPOINT
+# ==========================================================
+@app.route('/api/voice-detection', methods=['POST'])
+@require_api_key
+def voice_detection():
+    """
+    Main voice detection endpoint
+    Expected JSON Body:
+    {
+        "language": "Tamil" | "English" | "Hindi" | "Malayalam" | "Telugu",
+        "audioFormat": "mp3",
+        "audioBase64": "base64_encoded_audio_string"
+    }
+    Returns:
+    {
+        "status": "success",
+        "language": "Tamil",
+        "classification": "AI_GENERATED" | "HUMAN",
+        "confidenceScore": 0.0-1.0,
+        "explanation": "..."
+    }
+    """
+    global detector
+    try:
+        # Validate Content-Type
+        if not request.is_json:
+            return jsonify({
+                "status": "error",
+                "message": "Content-Type must be application/json"
+            }), 400
+        # Get request data
+        data = request.get_json()
+        # Validate required fields
+        required_fields = ['language', 'audioFormat', 'audioBase64']
+        missing_fields = [field for field in required_fields if field not in data]
+        if missing_fields:
+            return jsonify({
+                "status": "error",
+                "message": f"Missing required fields: {', '.join(missing_fields)}"
+            }), 400
+        # Validate language
+        supported_languages = ['Tamil', 'English', 'Hindi', 'Malayalam', 'Telugu']
+        if data['language'] not in supported_languages:
+            return jsonify({
+                "status": "error",
+                "message": f"Unsupported language. Must be one of: {', '.join(supported_languages)}"
+            }), 400
+        # Validate audio format
+        if data['audioFormat'].lower() != 'mp3':
+            return jsonify({
+                "status": "error",
+                "message": "Only MP3 audio format is supported"
+            }), 400
+        # Validate base64 string
+        audio_base64 = data['audioBase64']
+        if not audio_base64 or len(audio_base64) < 100:
+            return jsonify({
+                "status": "error",
+                "message": "Invalid or empty audio data"
+            }), 400
+        # Initialize detector if not already loaded
+        if detector is None:
+            logger.info("Lazy loading detector on first request...")
+            if not init_detector():
+                return jsonify({
+                    "status": "error",
+                    "message": "Failed to load AI detection models. Please try again later."
+                }), 503
+        # Log request
+        logger.info(f"Processing voice detection request for language: {data['language']}")
+        # Analyze audio
+        result = detector.analyze(
+            audio_base64,
+            input_type="base64",
+            audio_format=data['audioFormat']
+        )
+        # Check if analysis was successful
+        if result['status'] != 'success':
+            error_msg = result.get('error', 'Unknown error during analysis')
+            logger.error(f"Analysis failed: {error_msg}")
+            return jsonify({
+                "status": "error",
+                "message": f"Audio analysis failed: {error_msg}"
+            }), 500
+        # Prepare response (API compliant format - NO DEBUG INFO in production)
+        response = {
+            "status": "success",
+            "language": data['language'],  # Use requested language from input
+            "classification": result['classification'],
+            "confidenceScore": result['confidenceScore'],
+            "explanation": result['explanation']
+        }
+        logger.info(f"✅ Analysis complete: {result['classification']} (confidence: {result['confidenceScore']})")
+        return jsonify(response), 200
+    except Exception as e:
+        logger.error(f"Unexpected error in voice_detection: {str(e)}", exc_info=True)
+        return jsonify({
+            "status": "error",
+            "message": "Internal server error occurred during processing"
+        }), 500
+# ==========================================================
+# FEEDBACK / SELF-LEARNING ENDPOINT
+# ==========================================================
+@app.route('/api/feedback', methods=['POST'])
+@require_api_key
+def feedback():
+    """
+    Collect labeled audio samples for periodic self-learning.
+    Expected JSON Body:
+    {
+        "label": "AI_GENERATED" | "HUMAN",
+        "audioFormat": "mp3" | "wav",
+        "audioBase64": "base64_encoded_audio_string",
+        "runDetection": false,
+        "metadata": { ... }
+    }
+    """
+    if not ENABLE_FEEDBACK_STORAGE:
+        return jsonify({
+            "status": "error",
+            "message": "Feedback storage is disabled"
+        }), 403
+    if not request.is_json:
+        return jsonify({
+            "status": "error",
+            "message": "Content-Type must be application/json"
+        }), 400
+    data = request.get_json()
+    label = normalize_label(data.get("label"))
+    if not label:
+        return jsonify({
+            "status": "error",
+            "message": "Invalid label. Use AI_GENERATED or HUMAN."
+        }), 400
+    audio_format = str(data.get("audioFormat", "mp3")).lower()
+    if audio_format not in ["mp3", "wav"]:
+        return jsonify({
+            "status": "error",
+            "message": "audioFormat must be 'mp3' or 'wav'"
+        }), 400
+    audio_base64 = data.get("audioBase64")
+    if not audio_base64 or len(audio_base64) < 100:
+        return jsonify({
+            "status": "error",
+            "message": "Invalid or empty audio data"
+        }), 400
+    try:
+        audio_bytes, detected_format = decode_audio_base64(audio_base64)
+    except Exception as e:
+        return jsonify({
+            "status": "error",
+            "message": f"Failed to decode audio: {str(e)}"
+        }), 400
+    if detected_format:
+        audio_format = detected_format
+    if len(audio_bytes) > FEEDBACK_MAX_BYTES:
+        return jsonify({
+            "status": "error",
+            "message": "Audio payload exceeds maximum size"
+        }), 413
+    now = datetime.utcnow()
+    date_dir = now.strftime("%Y%m%d")
+    label_dir = os.path.join(FEEDBACK_STORAGE_DIR, label, date_dir)
+    os.makedirs(label_dir, exist_ok=True)
+    sample_id = str(uuid.uuid4())
+    extension = ".mp3" if audio_format == "mp3" else ".wav"
+    file_path = os.path.join(label_dir, f"{sample_id}{extension}")
+    with open(file_path, "wb") as handle:
+        handle.write(audio_bytes)
+    metadata = {
+        "id": sample_id,
+        "label": label,
+        "audio_format": audio_format,
+        "created_at": now.isoformat() + "Z",
+        "bytes": len(audio_bytes),
+        "path": file_path,
+        "client_metadata": data.get("metadata", {})
+    }
+    if parse_bool(data.get("runDetection", False)):
+        global detector
+        if detector is None:
+            logger.info("Lazy loading detector for feedback scoring...")
+            if not init_detector():
+                return jsonify({
+                    "status": "error",
+                    "message": "Failed to load AI detection models for scoring"
+                }), 503
+        scores = detector.extract_scores(file_path, input_type="file")
+        if scores.get("status") == "success":
+            metadata["physics_score"] = scores.get("physics_score")
+            metadata["dl_score"] = scores.get("dl_score")
+            metadata["dl_label"] = scores.get("dl_label")
+            metadata["audio_duration"] = scores.get("audio_duration")
+            metadata["was_truncated"] = scores.get("was_truncated")
+    meta_path = os.path.join(label_dir, f"{sample_id}.json")
+    with open(meta_path, "w", encoding="utf-8") as handle:
+        json.dump(metadata, handle, indent=2)
+    index_path = os.path.join(FEEDBACK_STORAGE_DIR, "index.jsonl")
+    with open(index_path, "a", encoding="utf-8") as handle:
+        handle.write(json.dumps(metadata) + "\n")
+    return jsonify({
+        "status": "success",
+        "id": sample_id,
+        "label": label,
+        "audioFormat": audio_format,
+        "stored": True
+    }), 200
+# ==========================================================
+# CALIBRATION RELOAD ENDPOINT
+# ==========================================================
+@app.route('/api/reload-calibration', methods=['POST'])
+@require_api_key
+def reload_calibration():
+    global detector
+    if detector is None:
+        logger.info("Lazy loading detector for calibration reload...")
+        if not init_detector():
+            return jsonify({
+                "status": "error",
+                "message": "Failed to load AI detection models"
+            }), 503
+    loaded = detector.reload_calibration(CALIBRATION_PATH)
+    if not loaded:
+        return jsonify({
+            "status": "error",
+            "message": "Calibration file not found or invalid"
+        }), 404
+    return jsonify({
+        "status": "success",
+        "calibrationPath": detector.calibrator.calibration_path
+    }), 200
+@app.route('/api/backup-calibration', methods=['POST'])
+@require_api_key
+def backup_calibration():
+    payload = request.get_json(silent=True) or {}
+    reason = payload.get("reason")
+    if not os.path.exists(CALIBRATION_PATH):
+        return jsonify({
+            "status": "error",
+            "message": "Calibration file not found"
+        }), 404
+    backup = archive_calibration(reason=reason or "api_backup")
+    if not backup:
+        return jsonify({
+            "status": "error",
+            "message": "Failed to archive calibration"
+        }), 500
+    return jsonify({
+        "status": "success",
+        "versionId": backup["versionId"],
+        "path": backup["path"]
+    }), 200
+@app.route('/api/calibration-history', methods=['GET'])
+@require_api_key
+def calibration_history():
+    history = list_calibration_history()
+    return jsonify({
+        "status": "success",
+        "history": history
+    }), 200
+@app.route('/api/rollback-calibration', methods=['POST'])
+@require_api_key
+def rollback_calibration():
+    payload = request.get_json(silent=True) or {}
+    version_id = payload.get("versionId")
+    if not version_id:
+        return jsonify({
+            "status": "error",
+            "message": "Missing versionId"
+        }), 400
+    source_path = resolve_history_path(version_id)
+    if not source_path or not os.path.exists(source_path):
+        return jsonify({
+            "status": "error",
+            "message": "Calibration version not found"
+        }), 404
+    ensure_dir(os.path.dirname(CALIBRATION_PATH))
+    shutil.copy2(source_path, CALIBRATION_PATH)
+    global detector
+    if detector is None:
+        logger.info("Lazy loading detector for rollback...")
+        if not init_detector():
+            return jsonify({
+                "status": "error",
+                "message": "Failed to load AI detection models"
+            }), 503
+    loaded = detector.reload_calibration(CALIBRATION_PATH)
+    if not loaded:
+        return jsonify({
+            "status": "error",
+            "message": "Failed to load calibration after rollback"
+        }), 500
+    return jsonify({
+        "status": "success",
+        "versionId": version_id,
+        "calibrationPath": CALIBRATION_PATH
+    }), 200
+# ==========================================================
+# REALTIME STREAMING ENDPOINT (WEBSOCKET)
+# ==========================================================
+@sock.route('/ws/voice-stream')
+def voice_stream(ws):
+    if not STREAMING_ENABLED:
+        ws.send(json.dumps({
+            "type": "error",
+            "message": "Streaming is disabled"
+        }))
+        return
+    api_key = get_ws_api_key(ws.environ)
+    if api_key != API_KEY:
+        ws.send(json.dumps({
+            "type": "error",
+            "message": "Invalid API key"
+        }))
+        return
+    session = None
+    requested_language = None
+    while True:
+        message = ws.receive()
+        if message is None:
+            break
+        try:
+            payload = json.loads(message)
+        except Exception:
+            ws.send(json.dumps({
+                "type": "error",
+                "message": "Invalid JSON message"
+            }))
+            continue
+        msg_type = payload.get("type")
+        if msg_type == "start":
+            if session is not None:
+                ws.send(json.dumps({
+                    "type": "error",
+                    "message": "Stream already started"
+                }))
+                continue
+            audio_format = str(payload.get("audioFormat", "pcm16")).lower()
+            if audio_format in ["pcm_s16le", "s16le", "pcm16le"]:
+                audio_format = "pcm16"
+            if audio_format not in STREAMING_SUPPORTED_FORMATS:
+                ws.send(json.dumps({
+                    "type": "error",
+                    "message": "Unsupported audioFormat for streaming"
+                }))
+                continue
+            sample_rate = int(payload.get("sampleRate", 16000))
+            channels = int(payload.get("channels", 1))
+            if audio_format == "pcm16":
+                if sample_rate <= 0 or channels <= 0:
+                    ws.send(json.dumps({
+                        "type": "error",
+                        "message": "sampleRate and channels must be positive for pcm16"
+                    }))
+                    continue
+                if channels not in [1, 2]:
+                    ws.send(json.dumps({
+                        "type": "error",
+                        "message": "channels must be 1 or 2 for pcm16"
+                    }))
+                    continue
+            requested_language = payload.get("language")
+            enable_partial = parse_bool(payload.get("enablePartial", True))
+            partial_interval = float(payload.get("partialIntervalSec", STREAMING_PARTIAL_INTERVAL_SECONDS))
+            max_seconds = int(payload.get("maxSeconds", STREAMING_MAX_BUFFER_SECONDS))
+            partial_mode = str(payload.get("partialMode", STREAMING_PARTIAL_MODE)).lower()
+            if partial_mode not in ["full", "physics", "dl"]:
+                partial_mode = "physics"
+            session = StreamSession(
+                audio_format=audio_format,
+                sample_rate=sample_rate,
+                channels=channels,
+                max_seconds=max_seconds,
+                enable_partial=enable_partial,
+                partial_interval_seconds=partial_interval,
+                partial_mode=partial_mode
+            )
+            ws.send(json.dumps({
+                "type": "ack",
+                "status": "ready",
+                "sessionId": session.session_id,
+                "streaming": {
+                    "audioFormat": audio_format,
+                    "sampleRate": sample_rate,
+                    "channels": channels,
+                    "maxSeconds": max_seconds,
+                    "partialIntervalSec": partial_interval,
+                    "partialMode": partial_mode,
+                    "enablePartial": enable_partial
+                }
+            }))
+            continue
+        if msg_type == "ping":
+            ws.send(json.dumps({"type": "pong"}))
+            continue
+        if msg_type not in ["audio_chunk", "stop"]:
+            ws.send(json.dumps({
+                "type": "error",
+                "message": "Unsupported message type"
+            }))
+            continue
+        if session is None:
+            ws.send(json.dumps({
+                "type": "error",
+                "message": "Stream not started"
+            }))
+            continue
+        finalize_only = False
+        if msg_type == "stop":
+            payload["final"] = True
+            finalize_only = True
+        chunk_b64 = payload.get("audioChunkBase64")
+        chunk_bytes = None
+        if not chunk_b64:
+            if not finalize_only:
+                ws.send(json.dumps({
+                    "type": "error",
+                    "message": "Missing audioChunkBase64"
+                }))
+                continue
+        else:
+            try:
+                chunk_bytes = base64.b64decode(chunk_b64)
+            except Exception:
+                ws.send(json.dumps({
+                    "type": "error",
+                    "message": "Invalid base64 audio chunk"
+                }))
+                continue
+            if len(chunk_bytes) > STREAMING_MAX_CHUNK_BYTES:
+                ws.send(json.dumps({
+                    "type": "error",
+                    "message": "Audio chunk exceeds maximum size"
+                }))
+                continue
+            buffer_seconds = session.add_chunk(chunk_bytes)
+            ws.send(json.dumps({
+                "type": "progress",
+                "receivedBytes": session.total_bytes_received,
+                "bufferBytes": len(session.buffer),
+                "bufferSeconds": buffer_seconds
+            }))
+        if session.should_run_partial():
+            if detector is None:
+                logger.info("Lazy loading detector for streaming...")
+                if not init_detector():
+                    ws.send(json.dumps({
+                        "type": "error",
+                        "message": "Failed to load AI detection models"
+                    }))
+                    break
+            temp_path = None
+            try:
+                temp_path, file_format = session.write_temp_audio_file()
+                result = detector.analyze(
+                    temp_path,
+                    input_type="file",
+                    audio_format=file_format,
+                    analysis_mode=session.partial_mode
+                )
+                ws.send(json.dumps({
+                    "type": "partial_result",
+                    "result": format_detection_payload(result, requested_language=requested_language)
+                }))
+            finally:
+                if temp_path and os.path.exists(temp_path):
+                    try:
+                        os.unlink(temp_path)
+                    except Exception:
+                        pass
+        if parse_bool(payload.get("final", False)):
+            if not session.buffer:
+                ws.send(json.dumps({
+                    "type": "error",
+                    "message": "No audio received"
+                }))
+                break
+            if detector is None:
+                logger.info("Lazy loading detector for streaming...")
+                if not init_detector():
+                    ws.send(json.dumps({
+                        "type": "error",
+                        "message": "Failed to load AI detection models"
+                    }))
+                    break
+            temp_path = None
+            try:
+                temp_path, file_format = session.write_temp_audio_file()
+                result = detector.analyze(
+                    temp_path,
+                    input_type="file",
+                    audio_format=file_format,
+                    analysis_mode="full"
+                )
+                ws.send(json.dumps({
+                    "type": "final_result",
+                    "result": format_detection_payload(result, requested_language=requested_language)
+                }))
+            finally:
+                if temp_path and os.path.exists(temp_path):
+                    try:
+                        os.unlink(temp_path)
+                    except Exception:
+                        pass
+            break
+# ==========================================================
+# ERROR HANDLERS
+# ==========================================================
+@app.errorhandler(404)
+def not_found(error):
+    """Handle 404 errors"""
+    return jsonify({
+        "status": "error",
+        "message": "Endpoint not found"
+    }), 404
+@app.errorhandler(405)
+def method_not_allowed(error):
+    """Handle 405 errors"""
+    return jsonify({
+        "status": "error",
+        "message": "Method not allowed for this endpoint"
+    }), 405
+@app.errorhandler(500)
+def internal_error(error):
+    """Handle 500 errors"""
+    logger.error(f"Internal server error: {str(error)}")
+    return jsonify({
+        "status": "error",
+        "message": "Internal server error"
+    }), 500
+# ==========================================================
+# RUN APPLICATION
+# ==========================================================
+if __name__ == '__main__':
+    # HuggingFace Spaces uses port 7860
+    port = int(os.environ.get('PORT', 7860))
+    # Run the app
+    logger.info(f"🚀 Starting Voice Detection API on port {port}")
+    logger.info(f"📍 Endpoint: http://0.0.0.0:{port}/api/voice-detection")
+    logger.info(f"🔑 API Key: {API_KEY}")
+    logger.info(f"🌐 Platform: HuggingFace Spaces")
+    app.run(
+        host='0.0.0.0',
+        port=port,
+        debug=False  # Always False in production
+    )

client.py ADDED Viewed

	@@ -0,0 +1,209 @@

+"""
+Example Client Script for Voice Detection API
+Demonstrates how to use the API from Python
+"""
+import requests
+import base64
+import json
+import argparse
+from pathlib import Path
+class VoiceDetectionClient:
+    """Client for interacting with Voice Detection API"""
+    def __init__(self, api_url, api_key):
+        """
+        Initialize the client
+        Args:
+            api_url: Base URL of the API (e.g., http://localhost:5000)
+            api_key: API authentication key
+        """
+        self.api_url = api_url.rstrip('/')
+        self.api_key = api_key
+        self.headers = {
+            'Content-Type': 'application/json',
+            'x-api-key': self.api_key
+        }
+    def check_health(self):
+        """Check if the API is healthy"""
+        try:
+            response = requests.get(f"{self.api_url}/health", timeout=5)
+            return response.json()
+        except Exception as e:
+            return {"status": "error", "message": str(e)}
+    def detect_voice(self, audio_path, language="English"):
+        """
+        Detect if voice is AI-generated or human
+        Args:
+            audio_path: Path to MP3 audio file
+            language: Language of the audio (Tamil/English/Hindi/Malayalam/Telugu)
+        Returns:
+            dict: API response
+        """
+        # Validate file exists
+        if not Path(audio_path).exists():
+            return {"status": "error", "message": f"File not found: {audio_path}"}
+        # Validate language
+        supported_languages = ['Tamil', 'English', 'Hindi', 'Malayalam', 'Telugu']
+        if language not in supported_languages:
+            return {
+                "status": "error",
+                "message": f"Unsupported language. Use: {', '.join(supported_languages)}"
+            }
+        # Read and encode audio
+        try:
+            with open(audio_path, 'rb') as f:
+                audio_data = f.read()
+                audio_base64 = base64.b64encode(audio_data).decode('utf-8')
+        except Exception as e:
+            return {"status": "error", "message": f"Failed to read audio file: {str(e)}"}
+        # Prepare request
+        payload = {
+            "language": language,
+            "audioFormat": "mp3",
+            "audioBase64": audio_base64
+        }
+        # Send request
+        try:
+            response = requests.post(
+                f"{self.api_url}/api/voice-detection",
+                headers=self.headers,
+                json=payload,
+                timeout=120  # 2 minutes timeout
+            )
+            return response.json()
+        except requests.exceptions.Timeout:
+            return {"status": "error", "message": "Request timed out"}
+        except requests.exceptions.ConnectionError:
+            return {"status": "error", "message": "Could not connect to API"}
+        except Exception as e:
+            return {"status": "error", "message": str(e)}
+    def print_result(self, result):
+        """Pretty print the result"""
+        print("\n" + "="*70)
+        print("🎙️  VOICE DETECTION RESULT")
+        print("="*70)
+        if result.get('status') == 'success':
+            print(f"✅ Status: {result['status'].upper()}")
+            print(f"🌐 Language: {result['language']}")
+            print(f"🎯 Classification: {result['classification']}")
+            print(f"📊 Confidence Score: {result['confidenceScore']:.2f} / 1.00")
+            print(f"💡 Explanation: {result['explanation']}")
+            # Interpretation
+            print("\n" + "-"*70)
+            if result['classification'] == 'AI_GENERATED':
+                print("⚠️  This voice appears to be AI-generated or synthetic")
+                if result['confidenceScore'] > 0.8:
+                    print("   High confidence - Strong indicators of AI generation")
+                elif result['confidenceScore'] > 0.65:
+                    print("   Medium confidence - Multiple suspicious patterns detected")
+                else:
+                    print("   Low confidence - Some indicators present but not conclusive")
+            else:
+                print("✅ This voice appears to be human/real")
+                if result['confidenceScore'] < 0.35:
+                    print("   High confidence - Strong human characteristics")
+                elif result['confidenceScore'] < 0.5:
+                    print("   Medium confidence - Mostly human patterns")
+                else:
+                    print("   Low confidence - Close to threshold")
+        else:
+            print(f"❌ Status: ERROR")
+            print(f"💬 Message: {result.get('message', 'Unknown error')}")
+        print("="*70 + "\n")
+def main():
+    """Main function with CLI interface"""
+    parser = argparse.ArgumentParser(
+        description='Voice Detection API Client',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Check API health
+  python client.py --health
+  # Detect single audio file
+  python client.py --audio test_audio.mp3 --language English
+  # Process multiple files
+  python client.py --audio file1.mp3 --audio file2.mp3 --language Tamil
+  # Use custom API URL and key
+  python client.py --audio test.mp3 --url http://api.example.com --key your_api_key
+        """
+    )
+    parser.add_argument(
+        '--url',
+        default='http://localhost:5000',
+        help='API base URL (default: http://localhost:5000)'
+    )
+    parser.add_argument(
+        '--key',
+        default='sk_test_123456789',
+        help='API key (default: sk_test_123456789)'
+    )
+    parser.add_argument(
+        '--health',
+        action='store_true',
+        help='Check API health'
+    )
+    parser.add_argument(
+        '--audio',
+        action='append',
+        help='Path to MP3 audio file (can be used multiple times)'
+    )
+    parser.add_argument(
+        '--language',
+        default='English',
+        choices=['Tamil', 'English', 'Hindi', 'Malayalam', 'Telugu'],
+        help='Language of the audio (default: English)'
+    )
+    args = parser.parse_args()
+    # Initialize client
+    client = VoiceDetectionClient(args.url, args.key)
+    # Health check
+    if args.health:
+        print("🏥 Checking API health...")
+        health = client.check_health()
+        print(json.dumps(health, indent=2))
+        return
+    # Process audio files
+    if args.audio:
+        for audio_file in args.audio:
+            print(f"\n🎵 Processing: {audio_file}")
+            print(f"   Language: {args.language}")
+            result = client.detect_voice(audio_file, args.language)
+            client.print_result(result)
+    else:
+        parser.print_help()
+if __name__ == '__main__':
+    main()

detector.py ADDED Viewed

	@@ -0,0 +1,875 @@

+import torch
+import librosa
+import numpy as np
+import scipy.stats as stats
+import torch.nn.functional as F
+from transformers import AutoModelForAudioClassification, AutoFeatureExtractor, WhisperProcessor, WhisperForConditionalGeneration
+import base64
+import io
+import json
+import math
+import tempfile
+import os
+import soundfile as sf
+import warnings
+# Suppress librosa warnings
+warnings.filterwarnings('ignore')
+class ScoreCalibrator:
+    """
+    Lightweight calibration model to adapt the final score using
+    physics and deep learning scores.
+    """
+    def __init__(self, calibration_path=None):
+        self.calibration_path = calibration_path
+        self.ready = False
+        self.weights = None
+        self.bias = 0.0
+        self.threshold = 0.5
+        self.metadata = {}
+        if calibration_path:
+            self.load(calibration_path)
+    def load(self, path=None):
+        path = path or self.calibration_path
+        if not path or not os.path.exists(path):
+            self.ready = False
+            return False
+        try:
+            with open(path, "r", encoding="utf-8") as handle:
+                data = json.load(handle)
+        except Exception:
+            self.ready = False
+            return False
+        weights = data.get("weights")
+        if not isinstance(weights, list) or len(weights) != 2:
+            self.ready = False
+            return False
+        self.weights = [float(weights[0]), float(weights[1])]
+        self.bias = float(data.get("bias", 0.0))
+        self.threshold = float(data.get("threshold", 0.5))
+        self.metadata = data
+        self.calibration_path = path
+        self.ready = True
+        return True
+    def predict(self, physics_score, dl_score):
+        if not self.ready or self.weights is None:
+            return None
+        z = (self.weights[0] * physics_score) + (self.weights[1] * dl_score) + self.bias
+        if z >= 0:
+            exp_neg = math.exp(-z)
+            prob = 1.0 / (1.0 + exp_neg)
+        else:
+            exp_pos = math.exp(z)
+            prob = exp_pos / (1.0 + exp_pos)
+        return float(prob)
+class HybridEnsembleDetector:
+    """
+    Hybrid AI Voice Detection System with Language Detection
+    Features:
+    1. Physics-based acoustic analysis
+    2. Deep Learning deepfake detection
+    3. Language identification using Whisper (focus on Indian languages)
+    4. Auto-truncation to 30 seconds for faster processing
+    """
+    def __init__(
+        self,
+        deepfake_model_path="garystafford/wav2vec2-deepfake-voice-detector",
+        whisper_model_path="openai/whisper-base",
+        physics_weight=0.4,
+        dl_weight=0.6,
+        use_local_deepfake_model=False,
+        use_local_whisper_model=False,
+        calibration_path=None,
+        max_audio_duration=30  # seconds
+    ):
+        """
+        Initialize the hybrid detector
+        Args:
+            deepfake_model_path: Path to deepfake detection model
+            whisper_model_path: Path to Whisper model for language detection
+            physics_weight: Weight for physics score (0-1)
+            dl_weight: Weight for DL score (0-1)
+            use_local_deepfake_model: Whether to load deepfake model from local path
+            use_local_whisper_model: Whether to load Whisper from local path
+            calibration_path: Optional path to calibration JSON file
+            max_audio_duration: Maximum audio duration to process (seconds)
+        """
+        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        self.max_duration = max_audio_duration
+        # Normalize weights
+        total_weight = physics_weight + dl_weight
+        self.physics_weight = physics_weight / total_weight
+        self.dl_weight = dl_weight / total_weight
+        self.calibrator = ScoreCalibrator(calibration_path)
+        if self.calibrator.ready:
+            print(f"   Calibration loaded from: {self.calibrator.calibration_path}")
+        print(f"🔧 Initializing Hybrid Detector with Language Detection")
+        print(f"   Device: {self.device}")
+        print(f"   Physics Weight: {self.physics_weight*100:.0f}%")
+        print(f"   DL Weight: {self.dl_weight*100:.0f}%")
+        print(f"   Max Audio Duration: {self.max_duration}s")
+        # --- LOAD DEEPFAKE DETECTION MODEL ---
+        try:
+            print(f"📥 Loading deepfake detection model from '{deepfake_model_path}'...")
+            if use_local_deepfake_model:
+                self.dl_model = AutoModelForAudioClassification.from_pretrained(
+                    deepfake_model_path,
+                    local_files_only=True
+                )
+                self.feature_extractor = AutoFeatureExtractor.from_pretrained(
+                    deepfake_model_path,
+                    local_files_only=True
+                )
+            else:
+                self.dl_model = AutoModelForAudioClassification.from_pretrained(deepfake_model_path)
+                self.feature_extractor = AutoFeatureExtractor.from_pretrained(deepfake_model_path)
+            self.dl_model.to(self.device)
+            self.dl_model.eval()
+            self.dl_ready = True
+            print("✅ Deepfake Detection Model Loaded")
+        except Exception as e:
+            print(f"⚠️  DL Model Load Failed: {e}")
+            print("   Running in Physics-Only mode")
+            self.dl_ready = False
+            self.dl_weight = 0
+            self.physics_weight = 1.0
+        # --- LOAD WHISPER FOR LANGUAGE DETECTION ---
+        try:
+            print(f"📥 Loading Whisper model for language detection from '{whisper_model_path}'...")
+            if use_local_whisper_model:
+                self.whisper_processor = WhisperProcessor.from_pretrained(
+                    whisper_model_path,
+                    local_files_only=True
+                )
+                self.whisper_model = WhisperForConditionalGeneration.from_pretrained(
+                    whisper_model_path,
+                    local_files_only=True
+                )
+            else:
+                self.whisper_processor = WhisperProcessor.from_pretrained(whisper_model_path)
+                self.whisper_model = WhisperForConditionalGeneration.from_pretrained(whisper_model_path)
+            self.whisper_model.to(self.device)
+            self.whisper_model.eval()
+            self.lang_ready = True
+            print("✅ Whisper Language Detection Model Loaded")
+            # Language code mapping for Indian languages and common languages
+            self.language_map = {
+                'hi': 'Hindi',
+                'bn': 'Bengali',
+                'te': 'Telugu',
+                'mr': 'Marathi',
+                'ta': 'Tamil',
+                'gu': 'Gujarati',
+                'kn': 'Kannada',
+                'ml': 'Malayalam',
+                'or': 'Odia',
+                'pa': 'Punjabi',
+                'as': 'Assamese',
+                'ur': 'Urdu',
+                'en': 'English',
+                'ne': 'Nepali',
+                'si': 'Sinhala',
+                'sa': 'Sanskrit',
+                'sd': 'Sindhi',
+                'ks': 'Kashmiri'
+            }
+        except Exception as e:
+            print(f"⚠️  Whisper Model Load Failed: {e}")
+            print("   Running without language detection")
+            self.lang_ready = False
+        # --- PHYSICS ENGINE PARAMETERS ---
+        self.CV_AI_THRESHOLD = 0.20
+        self.CV_HUMAN_THRESHOLD = 0.32
+        self.INTENSITY_MIN_STD = 0.05
+        self.INTENSITY_MAX_STD = 0.15
+        print("✅ Hybrid Detector Ready\n")
+    def reload_calibration(self, calibration_path=None):
+        """
+        Reload calibration weights from disk.
+        Args:
+            calibration_path: Optional override path
+        Returns:
+            bool: True if calibration loaded
+        """
+        if self.calibrator is None:
+            self.calibrator = ScoreCalibrator(calibration_path)
+            return self.calibrator.ready
+        return self.calibrator.load(calibration_path)
+    # ==========================================================
+    # HELPER: Audio Preprocessing
+    # ==========================================================
+    def preprocess_audio(self, audio_path, target_sr=16000):
+        """
+        Load and preprocess audio:
+        1. Load audio
+        2. Convert to mono
+        3. Truncate to max_duration if needed
+        4. Resample to target_sr
+        Args:
+            audio_path: Path to audio file
+            target_sr: Target sample rate
+        Returns:
+            tuple: (waveform_array, sample_rate, duration, was_truncated)
+        """
+        try:
+            # Load audio
+            y, sr = librosa.load(audio_path, sr=None, mono=True)
+            # Calculate duration
+            duration = len(y) / sr
+            was_truncated = False
+            # Truncate if longer than max_duration
+            if duration > self.max_duration:
+                print(f"   ⚠️  Audio is {duration:.1f}s, truncating to {self.max_duration}s")
+                max_samples = int(self.max_duration * sr)
+                y = y[:max_samples]
+                duration = self.max_duration
+                was_truncated = True
+            # Resample if needed
+            if sr != target_sr:
+                y = librosa.resample(y, orig_sr=sr, target_sr=target_sr)
+                sr = target_sr
+            return y, sr, duration, was_truncated
+        except Exception as e:
+            raise ValueError(f"Failed to preprocess audio: {str(e)}")
+    # ==========================================================
+    # HELPER: Base64 Decoding
+    # ==========================================================
+    def decode_base64_audio(self, base64_string, audio_format="mp3"):
+        """
+        Decode base64 audio and save to temporary file
+        Args:
+            base64_string: Base64 encoded audio data
+        Returns:
+            str: Path to temporary audio file
+        """
+        try:
+            detected_format = audio_format
+            if isinstance(base64_string, str) and base64_string.startswith("data:"):
+                header, base64_string = base64_string.split(",", 1)
+                header_lower = header.lower()
+                if "audio/wav" in header_lower or "audio/x-wav" in header_lower:
+                    detected_format = "wav"
+                elif "audio/mpeg" in header_lower or "audio/mp3" in header_lower:
+                    detected_format = "mp3"
+            # Decode base64
+            audio_data = base64.b64decode(base64_string)
+            file_suffix = ".wav" if str(detected_format).lower() in ["wav", "wave"] else ".mp3"
+            # Create temporary file
+            temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=file_suffix)
+            temp_file.write(audio_data)
+            temp_file.close()
+            return temp_file.name
+        except Exception as e:
+            raise ValueError(f"Failed to decode base64 audio: {str(e)}")
+    # ==========================================================
+    # LANGUAGE DETECTION
+    # ==========================================================
+    def detect_language(self, audio_path):
+        """
+        Detect language using Whisper model
+        Args:
+            audio_path: Path to audio file
+        Returns:
+            str: Detected language name
+        """
+        if not self.lang_ready:
+            return "Unknown"
+        try:
+            # Load and preprocess audio for Whisper (uses 16kHz)
+            # Use first 30 seconds for language detection
+            audio, sr = librosa.load(audio_path, sr=16000, mono=True, duration=30)
+            # Process audio with Whisper processor
+            input_features = self.whisper_processor(
+                audio,
+                sampling_rate=16000,
+                return_tensors="pt"
+            ).input_features
+            input_features = input_features.to(self.device)
+            # Whisper language detection using forced_decoder_ids
+            with torch.no_grad():
+                # Generate with language detection enabled
+                generated_ids = self.whisper_model.generate(
+                    input_features,
+                    task="transcribe",
+                    return_dict_in_generate=True
+                )
+                # Decode the output
+                full_output = self.whisper_processor.batch_decode(
+                    generated_ids.sequences,
+                    skip_special_tokens=False
+                )[0]
+                # Parse language from special tokens
+                # Format: <|startoftranscript|><|en|><|transcribe|>...
+                detected_lang = None
+                # Look for language tokens in the format <|xx|>
+                import re
+                lang_pattern = r'<\|([a-z]{2})\|>'
+                matches = re.findall(lang_pattern, full_output)
+                if matches:
+                    # First match after startoftranscript is usually the language
+                    for match in matches:
+                        if match in self.language_map:
+                            detected_lang = match
+                            break
+                if detected_lang:
+                    lang_name = self.language_map.get(detected_lang, detected_lang.upper())
+                    print(f"   🌐 Detected Language: {lang_name} ({detected_lang})")
+                    return lang_name
+                else:
+                    # Fallback: if transcription successful, assume English
+                    transcription = self.whisper_processor.batch_decode(
+                        generated_ids.sequences,
+                        skip_special_tokens=True
+                    )[0]
+                    if len(transcription.strip()) > 0:
+                        print(f"   🌐 Detected Language: English (default)")
+                        return "English"
+                    else:
+                        return "Unknown"
+        except Exception as e:
+            print(f"   ⚠️  Language detection error: {str(e)}")
+            return "Unknown"
+    def extract_scores(self, audio_input, input_type="file", audio_format="mp3"):
+        """
+        Extract physics and deep learning scores without language detection.
+        Args:
+            audio_input: Either file path or base64 string
+            input_type: "file" or "base64"
+            audio_format: "mp3" or "wav" when using base64
+        Returns:
+            dict: Score details
+        """
+        temp_file = None
+        try:
+            if input_type == "base64":
+                temp_file = self.decode_base64_audio(audio_input, audio_format=audio_format)
+                audio_path = temp_file
+            elif input_type == "file":
+                audio_path = audio_input
+                if not os.path.exists(audio_path):
+                    return {
+                        "status": "error",
+                        "error": f"Audio file not found: {audio_path}"
+                    }
+            else:
+                return {
+                    "status": "error",
+                    "error": f"Invalid input_type: {input_type}. Use 'file' or 'base64'"
+                }
+            phys_score, phys_method, phys_feats = self.get_physics_score(audio_path)
+            dl_score, dl_label = self.get_dl_score(audio_path)
+            return {
+                "status": "success",
+                "physics_score": float(phys_score),
+                "dl_score": float(dl_score),
+                "dl_label": dl_label,
+                "physics_method": phys_method,
+                "audio_duration": float(phys_feats.get("duration", 0)),
+                "was_truncated": bool(phys_feats.get("was_truncated", False))
+            }
+        except Exception as e:
+            return {
+                "status": "error",
+                "error": str(e)
+            }
+        finally:
+            if temp_file and os.path.exists(temp_file):
+                try:
+                    os.unlink(temp_file)
+                except Exception:
+                    pass
+    # ==========================================================
+    # PART A: PHYSICS ENGINE (FIXED)
+    # ==========================================================
+    def get_linear_score(self, val, min_val, max_val):
+        """Linear interpolation for scoring"""
+        if val <= min_val:
+            return 1.0
+        if val >= max_val:
+            return 0.0
+        return 1.0 - ((val - min_val) / (max_val - min_val))
+    def get_physics_score(self, audio_path):
+        """
+        Analyze audio using physics-based acoustic features
+        Returns:
+            tuple: (ai_score, method, features_dict)
+        """
+        try:
+            # Load audio at NATIVE sample rate (don't resample for physics analysis)
+            y, sr = librosa.load(audio_path, sr=None, mono=True)
+            # Calculate original duration
+            duration = len(y) / sr
+            was_truncated = False
+            # Truncate if needed
+            if duration > self.max_duration:
+                max_samples = int(self.max_duration * sr)
+                y = y[:max_samples]
+                duration = self.max_duration
+                was_truncated = True
+            print(f"   🔬 Running physics analysis on {duration:.1f}s audio at {sr}Hz")
+            # Robust pitch tracking using PYIN
+            try:
+                f0, voiced_flag, voiced_probs = librosa.pyin(
+                    y,
+                    fmin=librosa.note_to_hz('C2'),  # ~65 Hz
+                    fmax=librosa.note_to_hz('C7'),  # ~2093 Hz
+                    sr=sr,
+                    frame_length=2048
+                )
+                valid_f0 = f0[~np.isnan(f0)]
+            except Exception as pitch_error:
+                print(f"   ⚠️  Pitch detection failed: {pitch_error}, using fallback method")
+                # Fallback: use simpler pitch detection
+                valid_f0 = np.array([])
+            if len(valid_f0) < 10:  # Need at least 10 valid pitch points
+                print(f"   ⚠️  Insufficient pitch data ({len(valid_f0)} points), using alternative features")
+                # Fall back to non-pitch features
+                rms = librosa.feature.rms(y=y)[0]
+                centroid = librosa.feature.spectral_centroid(y=y, sr=sr)[0]
+                zcr = librosa.feature.zero_crossing_rate(y)[0]
+                feats = {
+                    'pitch_cv': 0.25,  # Neutral value
+                    'intensity_std': np.std(rms),
+                    'freq_skew': stats.skew(centroid),
+                    'zcr_std': np.std(zcr),
+                    'mean_pitch': 0,
+                    'std_pitch': 0,
+                    'duration': duration,
+                    'was_truncated': was_truncated
+                }
+                # Score based on available features
+                intensity_score = self.get_linear_score(
+                    feats['intensity_std'],
+                    self.INTENSITY_MIN_STD,
+                    self.INTENSITY_MAX_STD
+                )
+                zcr_score = self.get_linear_score(
+                    feats['zcr_std'],
+                    0.01,
+                    0.08
+                )
+                skew_score = self.get_linear_score(
+                    abs(feats['freq_skew']),
+                    0.1,
+                    1.0
+                )
+                # Weighted combination (no pitch)
+                final_score = (intensity_score * 0.5 + zcr_score * 0.2 + skew_score * 0.3)
+                print(f"   🔬 Physics score (no pitch): {final_score:.3f}")
+                return round(final_score, 3), "Physics Analysis (Limited)", feats
+            # Full analysis with pitch
+            rms = librosa.feature.rms(y=y)[0]
+            centroid = librosa.feature.spectral_centroid(y=y, sr=sr)[0]
+            mean_pitch = np.mean(valid_f0)
+            std_pitch = np.std(valid_f0)
+            # Calculate feature metrics
+            feats = {
+                'pitch_cv': std_pitch / mean_pitch if mean_pitch > 0 else 0,
+                'intensity_std': np.std(rms),
+                'freq_skew': stats.skew(centroid),
+                'mean_pitch': mean_pitch,
+                'std_pitch': std_pitch,
+                'duration': duration,
+                'was_truncated': was_truncated
+            }
+            # Individual feature scores (higher = more AI-like)
+            intensity_score = self.get_linear_score(
+                feats['intensity_std'],
+                self.INTENSITY_MIN_STD,
+                self.INTENSITY_MAX_STD
+            )
+            pitch_score = self.get_linear_score(
+                feats['pitch_cv'],
+                self.CV_AI_THRESHOLD,
+                self.CV_HUMAN_THRESHOLD
+            )
+            skew_score = self.get_linear_score(
+                abs(feats['freq_skew']),
+                0.1,
+                1.0
+            )
+            # Weighted combination
+            W_INTENSITY = 0.40
+            W_PITCH = 0.40
+            W_SKEW = 0.20
+            base_score = (
+                intensity_score * W_INTENSITY +
+                pitch_score * W_PITCH +
+                skew_score * W_SKEW
+            )
+            # Synergy bonus: if both intensity and pitch are suspicious
+            if intensity_score > 0.4 and pitch_score > 0.4:
+                final_score = min(base_score + 0.15, 1.0)
+            else:
+                final_score = base_score
+            print(f"   🔬 Physics score: {final_score:.3f} (intensity:{intensity_score:.2f}, pitch:{pitch_score:.2f})")
+            return round(final_score, 3), "Physics Analysis", feats
+        except Exception as e:
+            print(f"   ❌ Physics analysis failed: {str(e)}")
+            import traceback
+            traceback.print_exc()
+            return 0.0, f"Physics Error: {str(e)}", {'duration': 0, 'was_truncated': False}
+    # ==========================================================
+    # PART B: DEEP LEARNING ENGINE
+    # ==========================================================
+    def get_dl_score(self, audio_path):
+        """
+        Analyze audio using deep learning model
+        Returns:
+            tuple: (ai_score, label)
+        """
+        if not self.dl_ready:
+            return 0.0, "Model not loaded"
+        try:
+            # Load and preprocess audio
+            waveform_np, sr, duration, was_truncated = self.preprocess_audio(audio_path, target_sr=16000)
+            # Process with feature extractor
+            inputs = self.feature_extractor(
+                waveform_np,
+                sampling_rate=16000,
+                return_tensors="pt",
+                padding=True
+            )
+            # Move to device
+            inputs = {k: v.to(self.device) for k, v in inputs.items()}
+            # Run inference
+            with torch.no_grad():
+                outputs = self.dl_model(**inputs)
+                logits = outputs.logits
+                probs = F.softmax(logits, dim=-1)
+            # Get predictions
+            # Class 0: Real, Class 1: Fake
+            prob_real = probs[0][0].item()
+            prob_fake = probs[0][1].item()
+            # AI score is the fake probability
+            ai_score = prob_fake
+            label = "Fake/Deepfake" if prob_fake > 0.5 else "Real/Human"
+            return round(ai_score, 3), label
+        except Exception as e:
+            print(f"   ❌ DL analysis failed: {str(e)}")
+            return 0.0, f"DL Error: {str(e)}"
+    # ==========================================================
+    # PART C: EXPLANATION GENERATOR
+    # ==========================================================
+    def generate_explanation(self, final_score, phys_score, dl_score, dl_label, phys_feats, ai_threshold=0.55):
+        """
+        Generate human-readable explanation for the classification
+        Returns:
+            str: Explanation text
+        """
+        explanations = []
+        if final_score > ai_threshold:
+            # AI GENERATED
+            # Deep Learning contributions
+            if dl_score > 0.55 and self.dl_ready:
+                if "Fake" in dl_label or "Deepfake" in dl_label:
+                    explanations.append(
+                        f"Deep learning model detected synthetic voice patterns "
+                        f"(confidence: {dl_score*100:.1f}%)"
+                    )
+            # Physics contributions
+            if phys_score > 0.55:
+                p_cv = phys_feats.get('pitch_cv', 0)
+                i_std = phys_feats.get('intensity_std', 0)
+                if i_std < 0.06:
+                    explanations.append(
+                        f"Unnaturally consistent energy levels detected "
+                        f"(std: {i_std:.3f}, expected: >0.06)"
+                    )
+                if p_cv < 0.22 and p_cv > 0:
+                    explanations.append(
+                        f"Robotic pitch modulation patterns "
+                        f"(CV: {p_cv:.2f}, expected: >0.22)"
+                    )
+                if not explanations or (i_std >= 0.06 and p_cv >= 0.22):
+                    explanations.append(
+                        "Acoustic parameters lack natural human variability"
+                    )
+            if not explanations:
+                explanations.append(
+                    "Voice exhibits characteristics consistent with AI generation"
+                )
+        else:
+            # HUMAN
+            explanations.append(
+                "Voice exhibits natural acoustic variability and human speech characteristics"
+            )
+        return "; ".join(explanations)
+    # ==========================================================
+    # PART D: MAIN ANALYSIS FUNCTION
+    # ==========================================================
+    def analyze(self, audio_input, input_type="file", audio_format="mp3", analysis_mode="full"):
+        """
+        Main analysis function with configurable input types
+        Args:
+            audio_input: Either file path or base64 string
+            input_type: "file" or "base64"
+            audio_format: "mp3" or "wav" when using base64 input
+            analysis_mode: "full", "physics", or "dl"
+        Returns:
+            dict: Analysis results following API response format
+        """
+        temp_file = None
+        try:
+            analysis_mode = (analysis_mode or "full")
+            analysis_mode = str(analysis_mode).lower().strip()
+            if analysis_mode not in ["full", "physics", "dl"]:
+                return {
+                    "status": "error",
+                    "error": f"Invalid analysis_mode: {analysis_mode}. Use 'full', 'physics', or 'dl'"
+                }
+            # Handle input type
+            if input_type == "base64":
+                temp_file = self.decode_base64_audio(audio_input, audio_format=audio_format)
+                audio_path = temp_file
+            elif input_type == "file":
+                audio_path = audio_input
+                if not os.path.exists(audio_path):
+                    return {
+                        "status": "error",
+                        "error": f"Audio file not found: {audio_path}"
+                    }
+            else:
+                return {
+                    "status": "error",
+                    "error": f"Invalid input_type: {input_type}. Use 'file' or 'base64'"
+                }
+            print(f"🎵 Analyzing: {os.path.basename(audio_path)}")
+            # 1. Detect Language
+            detected_language = "Unknown"
+            if analysis_mode == "full":
+                detected_language = self.detect_language(audio_path)
+            # 2. Run Physics Analysis
+            phys_score = 0.0
+            phys_method = "Physics Skipped"
+            phys_feats = {'duration': 0, 'was_truncated': False}
+            if analysis_mode in ["full", "physics"]:
+                phys_score, phys_method, phys_feats = self.get_physics_score(audio_path)
+            # 3. Run Deep Learning Analysis
+            dl_score = 0.0
+            dl_label = "DL Skipped"
+            if analysis_mode in ["full", "dl"]:
+                dl_score, dl_label = self.get_dl_score(audio_path)
+            # 4. Calculate weighted ensemble score
+            used_calibration = False
+            threshold = 0.55
+            if analysis_mode == "full" and self.calibrator and self.calibrator.ready:
+                calibrated_score = self.calibrator.predict(phys_score, dl_score)
+                if calibrated_score is not None:
+                    final_score = calibrated_score
+                    used_calibration = True
+                    threshold = float(self.calibrator.threshold)
+                else:
+                    final_score = (
+                        self.physics_weight * phys_score +
+                        self.dl_weight * dl_score
+                    )
+            elif analysis_mode == "physics":
+                final_score = phys_score
+            elif analysis_mode == "dl":
+                final_score = dl_score
+            else:
+                final_score = (
+                    self.physics_weight * phys_score +
+                    self.dl_weight * dl_score
+                )
+            # Round to 2 decimal places
+            final_score = round(float(final_score), 2)
+            # 5. Determine classification
+            classification = "AI_GENERATED" if final_score > threshold else "HUMAN"
+            # 6. Generate explanation
+            explanation = self.generate_explanation(
+                final_score,
+                phys_score,
+                dl_score,
+                dl_label,
+                phys_feats,
+                ai_threshold=threshold
+            )
+            # 7. Return API-compliant response (ensure all values are JSON serializable)
+            return {
+                "status": "success",
+                "language": detected_language,
+                "classification": classification,
+                "confidenceScore": float(final_score),  # Convert to Python float
+                "explanation": explanation,
+                "analysisMode": analysis_mode,
+                "debug": {
+                    "physics_score": float(phys_score),
+                    "dl_score": float(dl_score),
+                    "dl_label": dl_label,
+                    "physics_weight": f"{self.physics_weight*100:.0f}%",
+                    "dl_weight": f"{self.dl_weight*100:.0f}%",
+                    "analysis_mode": analysis_mode,
+                    "used_calibration": used_calibration,
+                    "calibration_threshold": float(threshold) if used_calibration else None,
+                    "calibration_path": self.calibrator.calibration_path if used_calibration else None,
+                    "audio_duration": float(phys_feats.get('duration', 0)),
+                    "was_truncated": bool(phys_feats.get('was_truncated', False)),
+                    "physics_features": {k: float(v) if isinstance(v, (np.floating, np.integer)) else v
+                                        for k, v in phys_feats.items()
+                                        if k not in ['duration', 'was_truncated']}
+                }
+            }
+        except Exception as e:
+            import traceback
+            return {
+                "status": "error",
+                "error": str(e),
+                "traceback": traceback.format_exc()
+            }
+        finally:
+            # Clean up temporary file
+            if temp_file and os.path.exists(temp_file):
+                try:
+                    os.unlink(temp_file)
+                except:
+                    pass
+    # ==========================================================
+    # UTILITY: Update Weights
+    # ==========================================================
+    def update_weights(self, physics_weight, dl_weight):
+        """
+        Update ensemble weights dynamically
+        Args:
+            physics_weight: New physics weight (0-1)
+            dl_weight: New DL weight (0-1)
+        """
+        total = physics_weight + dl_weight
+        self.physics_weight = physics_weight / total
+        self.dl_weight = dl_weight / total
+        print(f"⚙️  Weights updated:")
+        print(f"   Physics: {self.physics_weight*100:.0f}%")
+        print(f"   DL: {self.dl_weight*100:.0f}%")

download_models.py ADDED Viewed

	@@ -0,0 +1,92 @@

+"""
+Pre-download Models Script
+Downloads all required AI models before deployment
+"""
+import sys
+import os
+from pathlib import Path
+print("="*70)
+print("Voice Detection API - Model Download Script")
+print("="*70)
+print()
+# Check if we're in the right directory
+if not Path("requirements.txt").exists():
+    print("ERROR: requirements.txt not found!")
+    print("Please run this script from the project root directory.")
+    sys.exit(1)
+print("This script will download the following models:")
+print("1. Wav2Vec2 Deepfake Detector (~1.2 GB)")
+print("2. Whisper Base Language Model (~500 MB)")
+print()
+print("Total download size: ~1.7 GB")
+print("This may take 5-15 minutes depending on your internet speed.")
+print()
+response = input("Continue? (y/n): ")
+if response.lower() != 'y':
+    print("Download cancelled.")
+    sys.exit(0)
+print()
+print("="*70)
+print("Step 1/2: Downloading Wav2Vec2 Deepfake Detector")
+print("="*70)
+try:
+    from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
+    print("Downloading model...")
+    model = AutoModelForAudioClassification.from_pretrained(
+        'garystafford/wav2vec2-deepfake-voice-detector'
+    )
+    print("Downloading feature extractor...")
+    feature_extractor = AutoFeatureExtractor.from_pretrained(
+        'garystafford/wav2vec2-deepfake-voice-detector'
+    )
+    print("✅ Wav2Vec2 model downloaded successfully!")
+    print()
+except Exception as e:
+    print(f"❌ Failed to download Wav2Vec2 model: {str(e)}")
+    print("Please check your internet connection and try again.")
+    sys.exit(1)
+print("="*70)
+print("Step 2/2: Downloading Whisper Language Detection Model")
+print("="*70)
+try:
+    from transformers import WhisperProcessor, WhisperForConditionalGeneration
+    print("Downloading processor...")
+    processor = WhisperProcessor.from_pretrained('openai/whisper-base')
+    print("Downloading model...")
+    model = WhisperForConditionalGeneration.from_pretrained('openai/whisper-base')
+    print("✅ Whisper model downloaded successfully!")
+    print()
+except Exception as e:
+    print(f"❌ Failed to download Whisper model: {str(e)}")
+    print("Please check your internet connection and try again.")
+    sys.exit(1)
+print("="*70)
+print("✅ All models downloaded successfully!")
+print("="*70)
+print()
+print("Models are cached in:", Path.home() / ".cache" / "huggingface")
+print()
+print("Next steps:")
+print("1. The models will be automatically used by the API")
+print("2. Start the API: python app.py")
+print("3. Test the API: python test_api.py")
+print()
+print("="*70)

pytest.ini ADDED Viewed

	@@ -0,0 +1,4 @@

+[pytest]
+testpaths = tests
+markers =
+    integration: tests that require full models and data

requirements.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+Flask
+flask-cors
+flask-sock
+Werkzeug
+transformers
+librosa
+soundfile
+scipy
+numpy
+pydub
+python-dotenv
+gunicorn
+pytest
+# Note: torch, torchaudio are handled in Dockerfile

self_learning_train.py ADDED Viewed

	@@ -0,0 +1,245 @@

+"""
+Train a lightweight calibration model from feedback audio samples.
+This script builds a simple logistic regression calibration layer that
+maps physics and deep learning scores to a calibrated probability.
+"""
+import argparse
+import json
+import os
+import shutil
+import sys
+from datetime import datetime
+import numpy as np
+from detector import HybridEnsembleDetector
+LABEL_MAP = {
+    "AI_GENERATED": 1,
+    "AI": 1,
+    "FAKE": 1,
+    "SYNTHETIC": 1,
+    "HUMAN": 0,
+    "REAL": 0
+}
+def sigmoid(z):
+    z = np.clip(z, -30.0, 30.0)
+    return 1.0 / (1.0 + np.exp(-z))
+def train_logreg(X, y, lr=0.5, epochs=300, l2=0.001):
+    w = np.zeros(X.shape[1], dtype=np.float64)
+    b = 0.0
+    n = float(X.shape[0])
+    for _ in range(epochs):
+        z = X.dot(w) + b
+        p = sigmoid(z)
+        error = p - y
+        grad_w = (X.T.dot(error) / n) + (l2 * w)
+        grad_b = error.mean()
+        w -= lr * grad_w
+        b -= lr * grad_b
+    return w, b
+def best_threshold(y_true, y_prob):
+    thresholds = np.linspace(0.1, 0.9, 81)
+    best_t = 0.5
+    best_f1 = -1.0
+    for t in thresholds:
+        preds = (y_prob >= t).astype(int)
+        tp = float(((preds == 1) & (y_true == 1)).sum())
+        fp = float(((preds == 1) & (y_true == 0)).sum())
+        fn = float(((preds == 0) & (y_true == 1)).sum())
+        precision = tp / (tp + fp + 1e-9)
+        recall = tp / (tp + fn + 1e-9)
+        f1 = (2 * precision * recall) / (precision + recall + 1e-9)
+        if f1 > best_f1:
+            best_f1 = f1
+            best_t = float(t)
+    return best_t, best_f1
+def iter_audio_files(data_dir, max_per_class=0):
+    samples = []
+    counts = {0: 0, 1: 0}
+    for label_name, label_value in LABEL_MAP.items():
+        label_dir = os.path.join(data_dir, label_name)
+        if not os.path.isdir(label_dir):
+            continue
+        for root, _, files in os.walk(label_dir):
+            for name in files:
+                if not name.lower().endswith((".mp3", ".wav")):
+                    continue
+                if max_per_class and counts[label_value] >= max_per_class:
+                    continue
+                file_path = os.path.join(root, name)
+                meta_path = os.path.splitext(file_path)[0] + ".json"
+                sample = {
+                    "path": file_path,
+                    "label": label_value
+                }
+                if os.path.exists(meta_path):
+                    try:
+                        with open(meta_path, "r", encoding="utf-8") as handle:
+                            meta = json.load(handle)
+                        if "physics_score" in meta and "dl_score" in meta:
+                            sample["physics_score"] = float(meta["physics_score"])
+                            sample["dl_score"] = float(meta["dl_score"])
+                    except Exception:
+                        pass
+                samples.append(sample)
+                counts[label_value] += 1
+    return samples
+def main():
+    parser = argparse.ArgumentParser(description="Train calibration layer from feedback samples")
+    parser.add_argument("--data-dir", default="data/feedback", help="Feedback dataset directory")
+    parser.add_argument("--output", default="data/calibration.json", help="Output calibration JSON file")
+    parser.add_argument("--history-dir", default=os.environ.get(
+        "CALIBRATION_HISTORY_DIR",
+        "data/calibration_history"
+    ), help="Directory to store calibration history backups")
+    parser.add_argument("--epochs", type=int, default=300, help="Training epochs")
+    parser.add_argument("--lr", type=float, default=0.5, help="Learning rate")
+    parser.add_argument("--l2", type=float, default=0.001, help="L2 regularization")
+    parser.add_argument("--min-samples", type=int, default=20, help="Minimum samples required")
+    parser.add_argument("--max-per-class", type=int, default=0, help="Max samples per class (0 = all)")
+    parser.add_argument("--deepfake-model-path", default=os.environ.get(
+        "DEEPFAKE_MODEL_PATH",
+        "garystafford/wav2vec2-deepfake-voice-detector"
+    ))
+    parser.add_argument("--whisper-model-path", default=os.environ.get(
+        "WHISPER_MODEL_PATH",
+        "openai/whisper-base"
+    ))
+    parser.add_argument("--use-local-deepfake-model", action="store_true", default=False)
+    parser.add_argument("--use-local-whisper-model", action="store_true", default=False)
+    parser.add_argument("--max-audio-duration", type=int, default=30)
+    args = parser.parse_args()
+    if args.history_dir:
+        os.makedirs(args.history_dir, exist_ok=True)
+    if not os.path.isdir(args.data_dir):
+        print(f"Data directory not found: {args.data_dir}")
+        return 1
+    samples = iter_audio_files(args.data_dir, max_per_class=args.max_per_class)
+    if not samples:
+        print("No audio samples found.")
+        return 1
+    needs_scoring = any("physics_score" not in sample for sample in samples)
+    detector = None
+    if needs_scoring:
+        detector = HybridEnsembleDetector(
+            deepfake_model_path=args.deepfake_model_path,
+            whisper_model_path=args.whisper_model_path,
+            use_local_deepfake_model=args.use_local_deepfake_model,
+            use_local_whisper_model=args.use_local_whisper_model,
+            max_audio_duration=args.max_audio_duration
+        )
+    features = []
+    labels = []
+    skipped = 0
+    for sample in samples:
+        if "physics_score" in sample and "dl_score" in sample:
+            phys_score = sample["physics_score"]
+            dl_score = sample["dl_score"]
+        else:
+            if detector is None:
+                skipped += 1
+                continue
+            scores = detector.extract_scores(sample["path"], input_type="file")
+            if scores.get("status") != "success":
+                skipped += 1
+                continue
+            phys_score = scores["physics_score"]
+            dl_score = scores["dl_score"]
+        features.append([phys_score, dl_score])
+        labels.append(sample["label"])
+    if skipped:
+        print(f"Skipped {skipped} samples due to scoring errors.")
+    if len(features) < args.min_samples:
+        print(f"Not enough samples to train. Found {len(features)}.")
+        return 1
+    X = np.array(features, dtype=np.float64)
+    y = np.array(labels, dtype=np.float64)
+    w, b = train_logreg(X, y, lr=args.lr, epochs=args.epochs, l2=args.l2)
+    probs = sigmoid(X.dot(w) + b)
+    threshold, f1 = best_threshold(y, probs)
+    predictions = (probs >= threshold).astype(int)
+    accuracy = float((predictions == y).mean())
+    output_dir = os.path.dirname(args.output)
+    if output_dir:
+        os.makedirs(output_dir, exist_ok=True)
+    if os.path.exists(args.output):
+        version_id = datetime.utcnow().strftime("%Y%m%dT%H%M%SZ") + "_" + os.urandom(4).hex()
+        history_name = f"calibration_{version_id}.json"
+        history_path = os.path.join(args.history_dir, history_name)
+        shutil.copy2(args.output, history_path)
+        meta_path = os.path.join(args.history_dir, f"calibration_{version_id}.meta.json")
+        meta = {
+            "versionId": version_id,
+            "source": args.output,
+            "archivedAt": datetime.utcnow().isoformat() + "Z",
+            "reason": "self_learning_train"
+        }
+        with open(meta_path, "w", encoding="utf-8") as handle:
+            json.dump(meta, handle, indent=2)
+    calibration = {
+        "version": 1,
+        "trained_at": datetime.utcnow().isoformat() + "Z",
+        "weights": [float(w[0]), float(w[1])],
+        "bias": float(b),
+        "threshold": float(threshold),
+        "feature_order": ["physics_score", "dl_score"],
+        "metrics": {
+            "accuracy": accuracy,
+            "f1": float(f1)
+        },
+        "samples": {
+            "count": int(len(features)),
+            "ai": int((y == 1).sum()),
+            "human": int((y == 0).sum())
+        }
+    }
+    with open(args.output, "w", encoding="utf-8") as handle:
+        json.dump(calibration, handle, indent=2)
+    print(f"Calibration saved to {args.output}")
+    print(f"Accuracy: {accuracy:.3f} | F1: {f1:.3f} | Threshold: {threshold:.2f}")
+    return 0
+if __name__ == "__main__":
+    sys.exit(main())

tests/conftest.py ADDED Viewed

	@@ -0,0 +1,144 @@

+import base64
+import importlib
+import os
+import sys
+from pathlib import Path
+import pytest
+TEST_API_KEY = "test_key_123"
+class DummyCalibrator:
+    ready = False
+    calibration_path = None
+class DummyDetector:
+    def __init__(self):
+        self.calibrator = DummyCalibrator()
+    def analyze(self, audio_input, input_type="file", audio_format="mp3", analysis_mode="full"):
+        return {
+            "status": "success",
+            "language": "English",
+            "classification": "AI_GENERATED",
+            "confidenceScore": 0.87,
+            "explanation": "Dummy detector response",
+            "analysisMode": analysis_mode,
+            "debug": {
+                "analysis_mode": analysis_mode,
+                "used_calibration": False
+            }
+        }
+    def extract_scores(self, audio_input, input_type="file", audio_format="mp3"):
+        return {
+            "status": "success",
+            "physics_score": 0.42,
+            "dl_score": 0.84,
+            "dl_label": "Fake/Deepfake",
+            "physics_method": "Physics Analysis",
+            "audio_duration": 1.0,
+            "was_truncated": False
+        }
+    def reload_calibration(self, calibration_path=None):
+        return bool(calibration_path and os.path.exists(calibration_path))
+def load_app(tmp_path, monkeypatch, overrides=None):
+    env = {
+        "API_KEY": TEST_API_KEY,
+        "SKIP_MODEL_LOAD": "true",
+        "ENABLE_STREAMING": "true",
+        "ENABLE_FEEDBACK_STORAGE": "true",
+        "FEEDBACK_STORAGE_DIR": str(tmp_path / "feedback"),
+        "FEEDBACK_MAX_BYTES": "2048",
+        "CALIBRATION_PATH": str(tmp_path / "calibration.json"),
+        "CALIBRATION_HISTORY_DIR": str(tmp_path / "calibration_history"),
+        "CALIBRATION_HISTORY_MAX": "5",
+        "STREAMING_PARTIAL_INTERVAL_SECONDS": "0.5"
+    }
+    if overrides:
+        env.update(overrides)
+    for key, value in env.items():
+        if value is None:
+            monkeypatch.delenv(key, raising=False)
+        else:
+            monkeypatch.setenv(key, str(value))
+    if "app" in sys.modules:
+        del sys.modules["app"]
+    app_module = importlib.import_module("app")
+    importlib.reload(app_module)
+    dummy = DummyDetector()
+    app_module.detector = dummy
+    def init_detector():
+        app_module.detector = dummy
+        return True
+    app_module.init_detector = init_detector
+    return app_module
+@pytest.fixture
+def app_factory(tmp_path, monkeypatch):
+    def _factory(**overrides):
+        return load_app(tmp_path, monkeypatch, overrides=overrides)
+    return _factory
+@pytest.fixture
+def app_module(app_factory):
+    return app_factory()
+@pytest.fixture
+def client(app_module):
+    return app_module.app.test_client()
+@pytest.fixture
+def api_headers():
+    return {
+        "Content-Type": "application/json",
+        "x-api-key": TEST_API_KEY
+    }
+@pytest.fixture
+def sample_audio_base64():
+    return base64.b64encode(b"\x00" * 200).decode("utf-8")
+def find_test_audio_files():
+    base_dir = Path(__file__).resolve().parent.parent / "test_audio"
+    if not base_dir.exists():
+        return []
+    return sorted([p for p in base_dir.iterdir() if p.suffix.lower() in [".mp3", ".wav"]])
+def load_test_audio_base64(prefer_extension=".mp3"):
+    candidates = find_test_audio_files()
+    for path in candidates:
+        if path.suffix.lower() == prefer_extension:
+            return path, base64.b64encode(path.read_bytes()).decode("utf-8")
+    if candidates:
+        path = candidates[0]
+        return path, base64.b64encode(path.read_bytes()).decode("utf-8")
+    return None, None
+@pytest.fixture
+def test_audio_base64():
+    path, b64_data = load_test_audio_base64(".mp3")
+    if not b64_data:
+        pytest.skip("No audio files found in test_audio/")
+    return path, b64_data

tests/test_api.py ADDED Viewed

	@@ -0,0 +1,177 @@

+import json
+from pathlib import Path
+import pytest
+def test_health(client):
+    response = client.get("/health")
+    assert response.status_code == 200
+    payload = response.get_json()
+    assert payload["status"] == "healthy"
+    assert payload["streaming_enabled"] is True
+def test_voice_detection_success_with_sample_base64(client, api_headers, sample_audio_base64):
+    payload = {
+        "language": "English",
+        "audioFormat": "mp3",
+        "audioBase64": sample_audio_base64
+    }
+    response = client.post("/api/voice-detection", data=json.dumps(payload), headers=api_headers)
+    assert response.status_code == 200
+    data = response.get_json()
+    assert data["status"] == "success"
+    assert data["classification"] == "AI_GENERATED"
+def test_voice_detection_success_with_test_audio(client, api_headers, test_audio_base64):
+    path, audio_b64 = test_audio_base64
+    if path.suffix.lower() != ".mp3":
+        pytest.skip("test_audio file is not mp3 (endpoint only supports mp3).")
+    payload = {
+        "language": "English",
+        "audioFormat": "mp3",
+        "audioBase64": audio_b64
+    }
+    response = client.post("/api/voice-detection", data=json.dumps(payload), headers=api_headers)
+    assert response.status_code == 200
+    data = response.get_json()
+    assert data["status"] == "success"
+def test_voice_detection_missing_api_key(client, sample_audio_base64):
+    payload = {
+        "language": "English",
+        "audioFormat": "mp3",
+        "audioBase64": sample_audio_base64
+    }
+    response = client.post("/api/voice-detection", json=payload)
+    assert response.status_code == 401
+def test_voice_detection_invalid_api_key(client, api_headers, sample_audio_base64):
+    payload = {
+        "language": "English",
+        "audioFormat": "mp3",
+        "audioBase64": sample_audio_base64
+    }
+    headers = dict(api_headers)
+    headers["x-api-key"] = "wrong_key"
+    response = client.post("/api/voice-detection", json=payload, headers=headers)
+    assert response.status_code == 403
+def test_voice_detection_invalid_content_type(client, api_headers):
+    response = client.post("/api/voice-detection", data="not json", headers=api_headers)
+    assert response.status_code == 400
+def test_voice_detection_missing_fields(client, api_headers):
+    payload = {"language": "English"}
+    response = client.post("/api/voice-detection", json=payload, headers=api_headers)
+    assert response.status_code == 400
+def test_voice_detection_unsupported_language(client, api_headers, sample_audio_base64):
+    payload = {
+        "language": "Spanish",
+        "audioFormat": "mp3",
+        "audioBase64": sample_audio_base64
+    }
+    response = client.post("/api/voice-detection", json=payload, headers=api_headers)
+    assert response.status_code == 400
+def test_voice_detection_unsupported_audio_format(client, api_headers, sample_audio_base64):
+    payload = {
+        "language": "English",
+        "audioFormat": "wav",
+        "audioBase64": sample_audio_base64
+    }
+    response = client.post("/api/voice-detection", json=payload, headers=api_headers)
+    assert response.status_code == 400
+def test_voice_detection_invalid_audio_payload(client, api_headers):
+    payload = {
+        "language": "English",
+        "audioFormat": "mp3",
+        "audioBase64": "short"
+    }
+    response = client.post("/api/voice-detection", json=payload, headers=api_headers)
+    assert response.status_code == 400
+def test_voice_detection_analysis_error(app_module, client, api_headers, sample_audio_base64):
+    def error_analyze(*args, **kwargs):
+        return {"status": "error", "error": "boom"}
+    app_module.detector.analyze = error_analyze
+    payload = {
+        "language": "English",
+        "audioFormat": "mp3",
+        "audioBase64": sample_audio_base64
+    }
+    response = client.post("/api/voice-detection", json=payload, headers=api_headers)
+    assert response.status_code == 500
+def test_reload_calibration_not_found(client, api_headers):
+    response = client.post("/api/reload-calibration", headers=api_headers)
+    assert response.status_code == 404
+def test_reload_calibration_success(app_module, client, api_headers):
+    calibration_file = Path(app_module.CALIBRATION_PATH)
+    calibration_file.parent.mkdir(parents=True, exist_ok=True)
+    calibration_file.write_text("{}", encoding="utf-8")
+    response = client.post("/api/reload-calibration", headers=api_headers)
+    assert response.status_code == 200
+def test_backup_and_rollback_calibration(app_module, client, api_headers):
+    calibration_file = Path(app_module.CALIBRATION_PATH)
+    calibration_file.parent.mkdir(parents=True, exist_ok=True)
+    calibration_file.write_text('{"version": "original"}', encoding="utf-8")
+    backup_response = client.post("/api/backup-calibration", headers=api_headers)
+    assert backup_response.status_code == 200
+    backup_payload = backup_response.get_json()
+    version_id = backup_payload["versionId"]
+    calibration_file.write_text('{"version": "new"}', encoding="utf-8")
+    rollback_response = client.post(
+        "/api/rollback-calibration",
+        json={"versionId": version_id},
+        headers=api_headers
+    )
+    assert rollback_response.status_code == 200
+    assert calibration_file.read_text(encoding="utf-8") == '{"version": "original"}'
+def test_backup_calibration_missing_file(client, api_headers):
+    response = client.post("/api/backup-calibration", headers=api_headers)
+    assert response.status_code == 404
+def test_rollback_calibration_missing_version(client, api_headers):
+    response = client.post("/api/rollback-calibration", json={}, headers=api_headers)
+    assert response.status_code == 400
+def test_calibration_history_list(app_module, client, api_headers):
+    history_dir = Path(app_module.CALIBRATION_HISTORY_DIR)
+    history_dir.mkdir(parents=True, exist_ok=True)
+    history_file = history_dir / "calibration_20260207T120000Z_deadbeef.json"
+    history_file.write_text("{}", encoding="utf-8")
+    response = client.get("/api/calibration-history", headers=api_headers)
+    assert response.status_code == 200
+    payload = response.get_json()
+    assert payload["status"] == "success"
+    assert payload["history"]

tests/test_feedback.py ADDED Viewed

	@@ -0,0 +1,67 @@

+import base64
+import json
+from pathlib import Path
+import pytest
+def test_feedback_success_with_scoring(client, api_headers, app_module):
+    audio_bytes = b"\x01" * 400
+    payload = {
+        "label": "AI_GENERATED",
+        "audioFormat": "mp3",
+        "audioBase64": base64.b64encode(audio_bytes).decode("utf-8"),
+        "runDetection": True,
+        "metadata": {"source": "unit-test"}
+    }
+    response = client.post("/api/feedback", json=payload, headers=api_headers)
+    assert response.status_code == 200
+    data = response.get_json()
+    assert data["status"] == "success"
+    storage_dir = Path(app_module.FEEDBACK_STORAGE_DIR)
+    assert storage_dir.exists()
+    stored_files = list(storage_dir.rglob("*.mp3"))
+    assert stored_files, "Expected feedback audio file to be stored"
+    meta_files = list(storage_dir.rglob("*.json"))
+    assert meta_files, "Expected feedback metadata to be stored"
+    metadata = json.loads(meta_files[0].read_text(encoding="utf-8"))
+    assert metadata["label"] == "AI_GENERATED"
+    assert "physics_score" in metadata
+    assert "dl_score" in metadata
+def test_feedback_invalid_label(client, api_headers, sample_audio_base64):
+    payload = {
+        "label": "UNKNOWN",
+        "audioFormat": "mp3",
+        "audioBase64": sample_audio_base64
+    }
+    response = client.post("/api/feedback", json=payload, headers=api_headers)
+    assert response.status_code == 400
+def test_feedback_disabled(app_factory, sample_audio_base64, api_headers):
+    app_module = app_factory(ENABLE_FEEDBACK_STORAGE="false")
+    client = app_module.app.test_client()
+    payload = {
+        "label": "HUMAN",
+        "audioFormat": "mp3",
+        "audioBase64": sample_audio_base64
+    }
+    response = client.post("/api/feedback", json=payload, headers=api_headers)
+    assert response.status_code == 403
+def test_feedback_too_large_payload(app_module, client, api_headers):
+    big_audio = base64.b64encode(b"\x00" * (app_module.FEEDBACK_MAX_BYTES + 10)).decode("utf-8")
+    payload = {
+        "label": "AI_GENERATED",
+        "audioFormat": "mp3",
+        "audioBase64": big_audio
+    }
+    response = client.post("/api/feedback", json=payload, headers=api_headers)
+    assert response.status_code == 413

tests/test_integration_model.py ADDED Viewed

	@@ -0,0 +1,60 @@

+import os
+from pathlib import Path
+import pytest
+from detector import HybridEnsembleDetector
+pytestmark = [
+    pytest.mark.integration,
+    pytest.mark.skipif(
+        os.environ.get("RUN_MODEL_TESTS", "").lower() not in ["1", "true", "yes"],
+        reason="Integration tests require RUN_MODEL_TESTS=true and model weights available."
+    )
+]
+def find_ai_miss_audio():
+    env_path = os.environ.get("AI_MISS_AUDIO_PATH")
+    if env_path and Path(env_path).exists():
+        return Path(env_path)
+    base_dir = Path(__file__).resolve().parent.parent / "test_audio"
+    if not base_dir.exists():
+        return None
+    candidates = []
+    for path in base_dir.iterdir():
+        if path.suffix.lower() not in [".mp3", ".wav"]:
+            continue
+        name = path.stem.lower()
+        if "miss" in name or "false" in name or "hard" in name:
+            candidates.append(path)
+    return candidates[0] if candidates else None
+@pytest.mark.xfail(reason="Known false negative before retraining", strict=False)
+def test_known_false_negative_ai_sample():
+    audio_path = find_ai_miss_audio()
+    if audio_path is None:
+        pytest.skip("No known false-negative AI sample provided.")
+    detector = HybridEnsembleDetector(
+        deepfake_model_path=os.environ.get(
+            "DEEPFAKE_MODEL_PATH",
+            "garystafford/wav2vec2-deepfake-voice-detector"
+        ),
+        whisper_model_path=os.environ.get(
+            "WHISPER_MODEL_PATH",
+            "openai/whisper-base"
+        ),
+        use_local_deepfake_model=os.environ.get("USE_LOCAL_DEEPFAKE_MODEL", "false").lower() in ["1", "true"],
+        use_local_whisper_model=os.environ.get("USE_LOCAL_WHISPER_MODEL", "false").lower() in ["1", "true"],
+        max_audio_duration=int(os.environ.get("MAX_AUDIO_DURATION", "30"))
+    )
+    result = detector.analyze(str(audio_path), input_type="file")
+    assert result["status"] == "success"
+    assert result["classification"] == "AI_GENERATED"

tests/test_streaming.py ADDED Viewed

	@@ -0,0 +1,92 @@

+import base64
+import json
+class FakeWebSocket:
+    def __init__(self, messages, api_key):
+        self._messages = iter(messages)
+        self.sent = []
+        self.environ = {"QUERY_STRING": f"api_key={api_key}"}
+    def receive(self):
+        try:
+            return next(self._messages)
+        except StopIteration:
+            return None
+    def send(self, message):
+        try:
+            self.sent.append(json.loads(message))
+        except Exception:
+            self.sent.append(message)
+def build_pcm16_chunk(sample_rate=16000, channels=1, seconds=1.0):
+    bytes_per_second = sample_rate * channels * 2
+    size = int(bytes_per_second * seconds)
+    return base64.b64encode(b"\x00" * size).decode("utf-8")
+def test_streaming_success_with_partial_and_final(app_module):
+    start_msg = json.dumps({
+        "type": "start",
+        "audioFormat": "pcm16",
+        "sampleRate": 16000,
+        "channels": 1,
+        "enablePartial": True,
+        "partialIntervalSec": 0.5
+    })
+    chunk_msg = json.dumps({
+        "type": "audio_chunk",
+        "audioChunkBase64": build_pcm16_chunk(seconds=1.0),
+        "final": True
+    })
+    ws = FakeWebSocket([start_msg, chunk_msg], api_key=app_module.API_KEY)
+    app_module.voice_stream(ws)
+    types = [msg.get("type") for msg in ws.sent if isinstance(msg, dict)]
+    assert "ack" in types
+    assert "progress" in types
+    assert "partial_result" in types
+    assert "final_result" in types
+def test_streaming_invalid_api_key(app_module):
+    start_msg = json.dumps({
+        "type": "start",
+        "audioFormat": "pcm16",
+        "sampleRate": 16000,
+        "channels": 1
+    })
+    ws = FakeWebSocket([start_msg], api_key="bad_key")
+    app_module.voice_stream(ws)
+    assert ws.sent
+    assert ws.sent[0]["type"] == "error"
+def test_streaming_invalid_format(app_module):
+    start_msg = json.dumps({
+        "type": "start",
+        "audioFormat": "aac",
+        "sampleRate": 16000,
+        "channels": 1
+    })
+    ws = FakeWebSocket([start_msg], api_key=app_module.API_KEY)
+    app_module.voice_stream(ws)
+    assert ws.sent
+    assert ws.sent[0]["type"] == "error"
+def test_streaming_disabled(app_factory):
+    app_module = app_factory(ENABLE_STREAMING="false")
+    ws = FakeWebSocket([json.dumps({"type": "start"})], api_key=app_module.API_KEY)
+    app_module.voice_stream(ws)
+    assert ws.sent
+    assert ws.sent[0]["type"] == "error"

try.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff