Krish1440's picture
Upload README.md
a15e8d7 verified
---
title: AudioShield AI Voice Detector
emoji: πŸ›‘οΈ
colorFrom: blue
colorTo: purple
sdk: docker
app_file: app.py
pinned: false
---
# AudioShield AI: Voice Fraud Detection System
> **Problem Statement 01**: AI-Generated Voice Detection for Regional Languages
![Architecture](Architechture.png)
## πŸš€ Overview
**AudioShield AI** is a high-performance REST API designed to detect AI-generated voice deepfakes with exceptional accuracy. Built for the **GUVI Hackathon**, it specifically addresses the challenge of identifying synthetic audio in **Tamil, English, Hindi, Malayalam, and Telugu**.
Unlike standard detectors, AudioShield uses a **Multi-Model Voting Ensemble** approach, aggregating the intelligence of 4 state-of-the-art Wav2Vec2 models to make a final, highly reliable decision.
## 🎯 Problem It Solves
With the rise of Generative AI, voice scams and deepfakes are becoming indistinguishable from reality. Financial fraud, impersonation, and misinformation are growing threats. AudioShield provides a robust, scalable defense mechanism that can be integrated into calls, messaging apps, and verification systems.
## ✨ Key Features
* **πŸ›‘οΈ Voting Ensemble Power**: Leverages 4 distinct AI models (MelodyMachine, Mo-Creator, Hemgg, Gustking-XLSR) to minimize false positives.
* **🌍 Multi-Lingual Support**: Optimized for Indian regional languages (Tamil, Telugu, Hindi, Malayalam) + English.
* **⚑ Zero Cold Start**: Implements a "Warm-up" routine to ensure the first API request is as fast as the 100th.
* **πŸš€ Render-Ready**: Configured for seamless deployment on cloud platforms like Render.
* **πŸ” Explainable AI**: Provides detailed JSON responses with classification confidence and logic.
## πŸ—οΈ System Architecture
The system follows a **Microservices-ready, Layered Architecture**:
```mermaid
graph TD
User[Client / Postman] -->|"HTTP POST (Base64)"| API[FastAPI Service]
API -->|"Async Thread"| Engine[Detection Engine]
subgraph "Ensemble Committee (The AI Core)"
Engine -->|Input| M1[MelodyMachine]
Engine -->|Input| M2[Mo-Creator]
Engine -->|Input| M3[Hemgg]
Engine -->|Input| M4["Gustking (XLSR)"]
M1 -->|Vote| Agg[Weighted Aggregator]
M2 -->|Vote| Agg
M3 -->|Vote| Agg
M4 -->|Vote| Agg
end
Agg -->|Final Score| Verdict[Classification Logic]
Verdict -->|JSON Response| User
```
### Core Components
1. **FastAPI Layer (`app.py`)**: Handles HTTP requests, validation, and async processing.
2. **Detection Engine (`detect.py`)**: Manages model loading, inference, and the ensemble voting logic.
3. **Models**:
* `MelodyMachine/Deepfake-audio-detection-V2`
* `mo-thecreator/Deepfake-audio-detection`
* `Hemgg/Deepfake-audio-detection`
* `Gustking/wav2vec2-large-xlsr-deepfake-audio-classification` (The "Expert" model)
## πŸ› οΈ Tech Stack
* **Language**: Python 3.10+
* **API Framework**: FastAPI, Uvicorn
* **ML Libraries**: PyTorch, Transformers, Librosa, NumPy
* **Deployment**: Docker-ready, Render-compatible
## πŸš€ Installation & Usage
### 1. Clone the Repository
```bash
git clone https://github.com/krish1440/AI-Generated-Voice-Detection.git
cd AI-Generated-Voice-Detection
```
### 2. Install Dependencies
```bash
pip install -r requirements.txt
```
### 3. Run the Server
```bash
python app.py
```
*The server will start on port `8000` (or the PORT env var).*
*Note: On the first run, it will download necessary model weights (approx. 2-3GB).*
## πŸ”Œ API Documentation
### Detect Voice
**Endpoint**: `POST /api/voice-detection`
**Request Body** (JSON):
```json
{
"language": "Tamil",
"audioFormat": "mp3",
"audioBase64": "<Base64 encoded MP3 string>"
}
```
**Response** (Success):
```json
{
"status": "success",
"language": "Tamil",
"classification": "AI_GENERATED",
"confidenceScore": 0.98,
"explanation": "Ensemble Analysis: 4/4 models flagged this audio as AI-generated."
}
```
**Response** (Error):
```json
{
"status": "error",
"message": "Invalid Base64 encoding."
}
```
## ☁️ Deployment (Hugging Face Spaces)
This project is Dockerized for Hugging Face Spaces.
1. Create a new **Space** on Hugging Face using the **Docker** SDK.
2. Connect your GitHub repository.
3. Hugging Face will automatically build using the `Dockerfile`.
4. The API will be live at `https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME/api/voice-detection`.
**Note**: The Dockerfile builds `ffmpeg` and runs as user `1000` for security compliance on Spaces.
> **Tip**: If the build fails with a registry error, try "Factory Reboot" in the Settings tab.
---
*Developed for GUVI Hackathon.*