--- title: AudioShield AI Voice Detector emoji: 🛡️ colorFrom: blue colorTo: purple sdk: docker app_file: app.py pinned: false --- # AudioShield AI: Voice Fraud Detection System > **Problem Statement 01**: AI-Generated Voice Detection for Regional Languages ![Architecture](Architechture.png) ## 🚀 Overview **AudioShield AI** is a high-performance REST API designed to detect AI-generated voice deepfakes with exceptional accuracy. Built for the **GUVI Hackathon**, it specifically addresses the challenge of identifying synthetic audio in **Tamil, English, Hindi, Malayalam, and Telugu**. Unlike standard detectors, AudioShield uses a **Multi-Model Voting Ensemble** approach, aggregating the intelligence of 4 state-of-the-art Wav2Vec2 models to make a final, highly reliable decision. ## 🎯 Problem It Solves With the rise of Generative AI, voice scams and deepfakes are becoming indistinguishable from reality. Financial fraud, impersonation, and misinformation are growing threats. AudioShield provides a robust, scalable defense mechanism that can be integrated into calls, messaging apps, and verification systems. ## ✨ Key Features * **🛡️ Voting Ensemble Power**: Leverages 4 distinct AI models (MelodyMachine, Mo-Creator, Hemgg, Gustking-XLSR) to minimize false positives. * **🌍 Multi-Lingual Support**: Optimized for Indian regional languages (Tamil, Telugu, Hindi, Malayalam) + English. * **⚡ Zero Cold Start**: Implements a "Warm-up" routine to ensure the first API request is as fast as the 100th. * **🚀 Render-Ready**: Configured for seamless deployment on cloud platforms like Render. * **🔍 Explainable AI**: Provides detailed JSON responses with classification confidence and logic. ## 🏗️ System Architecture The system follows a **Microservices-ready, Layered Architecture**: ```mermaid graph TD User[Client / Postman] -->|"HTTP POST (Base64)"| API[FastAPI Service] API -->|"Async Thread"| Engine[Detection Engine] subgraph "Ensemble Committee (The AI Core)" Engine -->|Input| M1[MelodyMachine] Engine -->|Input| M2[Mo-Creator] Engine -->|Input| M3[Hemgg] Engine -->|Input| M4["Gustking (XLSR)"] M1 -->|Vote| Agg[Weighted Aggregator] M2 -->|Vote| Agg M3 -->|Vote| Agg M4 -->|Vote| Agg end Agg -->|Final Score| Verdict[Classification Logic] Verdict -->|JSON Response| User ``` ### Core Components 1. **FastAPI Layer (`app.py`)**: Handles HTTP requests, validation, and async processing. 2. **Detection Engine (`detect.py`)**: Manages model loading, inference, and the ensemble voting logic. 3. **Models**: * `MelodyMachine/Deepfake-audio-detection-V2` * `mo-thecreator/Deepfake-audio-detection` * `Hemgg/Deepfake-audio-detection` * `Gustking/wav2vec2-large-xlsr-deepfake-audio-classification` (The "Expert" model) ## 🛠️ Tech Stack * **Language**: Python 3.10+ * **API Framework**: FastAPI, Uvicorn * **ML Libraries**: PyTorch, Transformers, Librosa, NumPy * **Deployment**: Docker-ready, Render-compatible ## 🚀 Installation & Usage ### 1. Clone the Repository ```bash git clone https://github.com/krish1440/AI-Generated-Voice-Detection.git cd AI-Generated-Voice-Detection ``` ### 2. Install Dependencies ```bash pip install -r requirements.txt ``` ### 3. Run the Server ```bash python app.py ``` *The server will start on port `8000` (or the PORT env var).* *Note: On the first run, it will download necessary model weights (approx. 2-3GB).* ## 🔌 API Documentation ### Detect Voice **Endpoint**: `POST /api/voice-detection` **Request Body** (JSON): ```json { "language": "Tamil", "audioFormat": "mp3", "audioBase64": "" } ``` **Response** (Success): ```json { "status": "success", "language": "Tamil", "classification": "AI_GENERATED", "confidenceScore": 0.98, "explanation": "Ensemble Analysis: 4/4 models flagged this audio as AI-generated." } ``` **Response** (Error): ```json { "status": "error", "message": "Invalid Base64 encoding." } ``` ## ☁️ Deployment (Hugging Face Spaces) This project is Dockerized for Hugging Face Spaces. 1. Create a new **Space** on Hugging Face using the **Docker** SDK. 2. Connect your GitHub repository. 3. Hugging Face will automatically build using the `Dockerfile`. 4. The API will be live at `https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME/api/voice-detection`. **Note**: The Dockerfile builds `ffmpeg` and runs as user `1000` for security compliance on Spaces. > **Tip**: If the build fails with a registry error, try "Factory Reboot" in the Settings tab. --- *Developed for GUVI Hackathon.*