Spaces:
Sleeping
Sleeping
| title: VoiceGuard API | |
| emoji: π‘οΈ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| pinned: false | |
| app_port: 7860 | |
| # AI-Generated Voice Detector API | |
| A production-ready REST API that accurately detects whether a given voice recording is **AI-generated** or **Human**. | |
| Built for the **AI-Generated Voice Detection Challenge** with specific support for **Tamil, English, Hindi, Malayalam, and Telugu**. | |
| --- | |
| ## π Features | |
| - **Multilingual Support**: Uses the state-of-the-art **MMS-300M (Massively Multilingual Speech)** model (`nii-yamagishilab/mms-300m-anti-deepfake`) derived from **XLS-R**, supporting 100+ languages including Indic languages. | |
| - **Strict API Specification**: Compliant with challenge requirements (Base64 MP3 input, standardized JSON response). | |
| - **Smart Hybrid Detection**: Combines Deep Learning embeddings with **Acoustic Heuristics** (Pitch, Flatness, Liveness) for "Conservative Consensus" detection. | |
| - **Explainability**: Provides human-readable explanations for every decision. | |
| - **Secure**: Protected via `x-api-key` header authentication. | |
| --- | |
| ## π οΈ Tech Stack | |
| - **Framework**: FastAPI (Python) | |
| - **Model**: PyTorch + HuggingFace Transformers (`nii-yamagishilab/mms-300m-anti-deepfake`) | |
| - **Toolkit**: **SpeechBrain** (Environment ready for advanced audio processing) | |
| - **Audio Processing**: `pydub` (ffmpeg) + `librosa` | |
| - **Deployment**: Uvicorn | |
| --- | |
| ## π₯ Installation | |
| ### 1. Pre-requisites | |
| - **Python 3.8+** | |
| - **FFmpeg**: Required for audio processing (`pydub`). | |
| - **Linux**: `sudo apt install ffmpeg` | |
| - **Windows**: [Download here](https://ffmpeg.org/download.html) and add to Path. | |
| ### 2. Setup (Linux / macOS) | |
| ```bash | |
| # Create virtual environment | |
| python3 -m venv venv | |
| # Activate | |
| source venv/bin/activate | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| ``` | |
| ### 3. Setup (Windows) | |
| ```powershell | |
| # Create virtual environment | |
| python -m venv venv | |
| # Activate | |
| .\venv\Scripts\activate | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| ``` | |
| ### 4. Configure Environment | |
| Create a `.env` file in the root directory: | |
| ```bash | |
| API_KEY=test-key-123 | |
| ``` | |
| --- | |
| ## βΆοΈ Running the Server | |
| **Universal Command:** | |
| ```bash | |
| uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload | |
| ``` | |
| *The server will start at `http://localhost:8000`.* | |
| --- | |
| ## π‘ API Usage | |
| ### Endpoint: `POST /api/voice-detection` | |
| #### Headers | |
| | Key | Value | | |
| | -- | -- | | |
| | `x-api-key` | `your-secret-key-123` | | |
| | `Content-Type` | `application/json` | | |
| #### Request Body | |
| ```json | |
| { | |
| "language": "Tamil", | |
| "audioFormat": "mp3", | |
| "audioBase64": "<BASE64_ENCODED_MP3_STRING>" | |
| } | |
| ``` | |
| #### Response Example | |
| ```json | |
| { | |
| "status": "success", | |
| "language": "Tamil", | |
| "classification": "HUMAN", | |
| "confidenceScore": 0.98, | |
| "explanation": "High pitch variance and natural prosody detected." | |
| } | |
| ``` | |
| --- | |
| ## π§ͺ Testing | |
| ### 1. Run the Verification Script | |
| We have a built-in test suite that verifies the audio pipeline and model inference: | |
| ```bash | |
| python verify_pipeline.py | |
| ``` | |
| ### 2. Run End-to-End API Test | |
| To test the actual running server with a real generated MP3 file: | |
| ```bash | |
| # Ensure server is running in another terminal first! | |
| python test_api.py | |
| ``` | |
| ### 3. cURL Command | |
| ```bash | |
| curl -X POST http://127.0.0.1:8000/api/voice-detection \ | |
| -H "x-api-key: your-secret-key-123" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "language": "English", | |
| "audioFormat": "mp3", | |
| "audioBase64": "SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU2LjM2LjEwMAAAAAAA..." | |
| }' | |
| ``` | |
| --- | |
| ## π Project Structure | |
| ```text | |
| voice-detector/ | |
| βββ app/ | |
| β βββ main.py # API Entry point & Routes | |
| β βββ infer.py # Model Inference Logic (XLS-R + Classifier) | |
| β βββ audio.py # Audio Normalization (Base64 -> 16kHz WAV) | |
| β βββ auth.py # Utilities | |
| βββ model/ # Model weights storage | |
| βββ requirements.txt # Python dependencies | |
| βββ .env # Config keys | |
| βββ verify_pipeline.py# System health check script | |
| βββ test_api.py # Live API integration test | |
| ``` | |
| --- | |
| ## π§ Model Logic (How it works) | |
| 1. **Input**: Takes Base64 MP3. | |
| 2. **Normalization**: Converts to **16,000Hz Mono WAV**. | |
| 3. **Encoder**: Feeds audio into **Wav2Vec2-XLS-R-53** to get a 1024-dimensional embedding. | |
| 4. **Feature Extraction**: Calculates **Pitch Variance** to detect robotic flatness. | |
| 5. **Classifier**: A linear layer combines `[Embedding (1024) + Pitch (1)]` to predict `AI_GENERATED` or `HUMAN`. | |