Krish1440's picture
Upload README.md
a15e8d7 verified
metadata
title: AudioShield AI Voice Detector
emoji: πŸ›‘οΈ
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: false

AudioShield AI: Voice Fraud Detection System

Problem Statement 01: AI-Generated Voice Detection for Regional Languages

Architecture

πŸš€ Overview

AudioShield AI is a high-performance REST API designed to detect AI-generated voice deepfakes with exceptional accuracy. Built for the GUVI Hackathon, it specifically addresses the challenge of identifying synthetic audio in Tamil, English, Hindi, Malayalam, and Telugu.

Unlike standard detectors, AudioShield uses a Multi-Model Voting Ensemble approach, aggregating the intelligence of 4 state-of-the-art Wav2Vec2 models to make a final, highly reliable decision.

🎯 Problem It Solves

With the rise of Generative AI, voice scams and deepfakes are becoming indistinguishable from reality. Financial fraud, impersonation, and misinformation are growing threats. AudioShield provides a robust, scalable defense mechanism that can be integrated into calls, messaging apps, and verification systems.

✨ Key Features

  • πŸ›‘οΈ Voting Ensemble Power: Leverages 4 distinct AI models (MelodyMachine, Mo-Creator, Hemgg, Gustking-XLSR) to minimize false positives.
  • 🌍 Multi-Lingual Support: Optimized for Indian regional languages (Tamil, Telugu, Hindi, Malayalam) + English.
  • ⚑ Zero Cold Start: Implements a "Warm-up" routine to ensure the first API request is as fast as the 100th.
  • πŸš€ Render-Ready: Configured for seamless deployment on cloud platforms like Render.
  • πŸ” Explainable AI: Provides detailed JSON responses with classification confidence and logic.

πŸ—οΈ System Architecture

The system follows a Microservices-ready, Layered Architecture:

graph TD
    User[Client / Postman] -->|"HTTP POST (Base64)"| API[FastAPI Service]
    API -->|"Async Thread"| Engine[Detection Engine]
    
    subgraph "Ensemble Committee (The AI Core)"
        Engine -->|Input| M1[MelodyMachine]
        Engine -->|Input| M2[Mo-Creator]
        Engine -->|Input| M3[Hemgg]
        Engine -->|Input| M4["Gustking (XLSR)"]
        
        M1 -->|Vote| Agg[Weighted Aggregator]
        M2 -->|Vote| Agg
        M3 -->|Vote| Agg
        M4 -->|Vote| Agg
    end
    
    Agg -->|Final Score| Verdict[Classification Logic]
    Verdict -->|JSON Response| User

Core Components

  1. FastAPI Layer (app.py): Handles HTTP requests, validation, and async processing.
  2. Detection Engine (detect.py): Manages model loading, inference, and the ensemble voting logic.
  3. Models:
    • MelodyMachine/Deepfake-audio-detection-V2
    • mo-thecreator/Deepfake-audio-detection
    • Hemgg/Deepfake-audio-detection
    • Gustking/wav2vec2-large-xlsr-deepfake-audio-classification (The "Expert" model)

πŸ› οΈ Tech Stack

  • Language: Python 3.10+
  • API Framework: FastAPI, Uvicorn
  • ML Libraries: PyTorch, Transformers, Librosa, NumPy
  • Deployment: Docker-ready, Render-compatible

πŸš€ Installation & Usage

1. Clone the Repository

git clone https://github.com/krish1440/AI-Generated-Voice-Detection.git
cd AI-Generated-Voice-Detection

2. Install Dependencies

pip install -r requirements.txt

3. Run the Server

python app.py

The server will start on port 8000 (or the PORT env var). Note: On the first run, it will download necessary model weights (approx. 2-3GB).

πŸ”Œ API Documentation

Detect Voice

Endpoint: POST /api/voice-detection

Request Body (JSON):

{
  "language": "Tamil",
  "audioFormat": "mp3",
  "audioBase64": "<Base64 encoded MP3 string>"
}

Response (Success):

{
  "status": "success",
  "language": "Tamil",
  "classification": "AI_GENERATED",
  "confidenceScore": 0.98,
  "explanation": "Ensemble Analysis: 4/4 models flagged this audio as AI-generated."
}

Response (Error):

{
  "status": "error",
  "message": "Invalid Base64 encoding."
}

☁️ Deployment (Hugging Face Spaces)

This project is Dockerized for Hugging Face Spaces.

  1. Create a new Space on Hugging Face using the Docker SDK.
  2. Connect your GitHub repository.
  3. Hugging Face will automatically build using the Dockerfile.
  4. The API will be live at https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME/api/voice-detection.

Note: The Dockerfile builds ffmpeg and runs as user 1000 for security compliance on Spaces.

Tip: If the build fails with a registry error, try "Factory Reboot" in the Settings tab.


Developed for GUVI Hackathon.