Spaces:
Sleeping
Sleeping
metadata
title: VoiceGuard API
emoji: π‘οΈ
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
app_port: 7860
AI-Generated Voice Detector API
A production-ready REST API that accurately detects whether a given voice recording is AI-generated or Human.
Built for the AI-Generated Voice Detection Challenge with specific support for Tamil, English, Hindi, Malayalam, and Telugu.
π Features
- Multilingual Support: Uses the state-of-the-art MMS-300M (Massively Multilingual Speech) model (
nii-yamagishilab/mms-300m-anti-deepfake) derived from XLS-R, supporting 100+ languages including Indic languages. - Strict API Specification: Compliant with challenge requirements (Base64 MP3 input, standardized JSON response).
- Smart Hybrid Detection: Combines Deep Learning embeddings with Acoustic Heuristics (Pitch, Flatness, Liveness) for "Conservative Consensus" detection.
- Explainability: Provides human-readable explanations for every decision.
- Secure: Protected via
x-api-keyheader authentication.
π οΈ Tech Stack
- Framework: FastAPI (Python)
- Model: PyTorch + HuggingFace Transformers (
nii-yamagishilab/mms-300m-anti-deepfake) - Toolkit: SpeechBrain (Environment ready for advanced audio processing)
- Audio Processing:
pydub(ffmpeg) +librosa - Deployment: Uvicorn
π₯ Installation
1. Pre-requisites
- Python 3.8+
- FFmpeg: Required for audio processing (
pydub).- Linux:
sudo apt install ffmpeg - Windows: Download here and add to Path.
- Linux:
2. Setup (Linux / macOS)
# Create virtual environment
python3 -m venv venv
# Activate
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
3. Setup (Windows)
# Create virtual environment
python -m venv venv
# Activate
.\venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
4. Configure Environment
Create a .env file in the root directory:
API_KEY=test-key-123
βΆοΈ Running the Server
Universal Command:
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
The server will start at http://localhost:8000.
π‘ API Usage
Endpoint: POST /api/voice-detection
Headers
| Key | Value |
|---|---|
x-api-key |
your-secret-key-123 |
Content-Type |
application/json |
Request Body
{
"language": "Tamil",
"audioFormat": "mp3",
"audioBase64": "<BASE64_ENCODED_MP3_STRING>"
}
Response Example
{
"status": "success",
"language": "Tamil",
"classification": "HUMAN",
"confidenceScore": 0.98,
"explanation": "High pitch variance and natural prosody detected."
}
π§ͺ Testing
1. Run the Verification Script
We have a built-in test suite that verifies the audio pipeline and model inference:
python verify_pipeline.py
2. Run End-to-End API Test
To test the actual running server with a real generated MP3 file:
# Ensure server is running in another terminal first!
python test_api.py
3. cURL Command
curl -X POST http://127.0.0.1:8000/api/voice-detection \
-H "x-api-key: your-secret-key-123" \
-H "Content-Type: application/json" \
-d '{
"language": "English",
"audioFormat": "mp3",
"audioBase64": "SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU2LjM2LjEwMAAAAAAA..."
}'
π Project Structure
voice-detector/
βββ app/
β βββ main.py # API Entry point & Routes
β βββ infer.py # Model Inference Logic (XLS-R + Classifier)
β βββ audio.py # Audio Normalization (Base64 -> 16kHz WAV)
β βββ auth.py # Utilities
βββ model/ # Model weights storage
βββ requirements.txt # Python dependencies
βββ .env # Config keys
βββ verify_pipeline.py# System health check script
βββ test_api.py # Live API integration test
π§ Model Logic (How it works)
- Input: Takes Base64 MP3.
- Normalization: Converts to 16,000Hz Mono WAV.
- Encoder: Feeds audio into Wav2Vec2-XLS-R-53 to get a 1024-dimensional embedding.
- Feature Extraction: Calculates Pitch Variance to detect robotic flatness.
- Classifier: A linear layer combines
[Embedding (1024) + Pitch (1)]to predictAI_GENERATEDorHUMAN.