OdioCheck-Backend / README.md
JunSiang26's picture
Update README with live demo link
2978fce
---
title: OdioCheck-Backend
emoji: πŸŽ™οΈ
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
---
<img width="1080" height="324" alt="odiocheck" src="https://github.com/user-attachments/assets/4d7b573e-5b0b-4fc7-85de-da60bbb701c2" />
# OdioCheck - Deepfake Voice Detection AI
*50.021 Artificial Intelligence Project*
[![Live Demo](https://img.shields.io/badge/Live%20Demo-Vercel-brightgreen?style=for-the-badge&logo=vercel)](https://odio-check.vercel.app/)
[![Backend](https://img.shields.io/badge/Backend-Hugging%20Face-yellow?style=for-the-badge&logo=huggingface)](https://huggingface.co/spaces/JunSiang26/OdioCheck-Backend)
OdioCheck is a cutting-edge deepfake audio detection system designed to tackle the rising threat of voice clones used in scams and misinformation. It features a unique hybrid fusion architecture that outperforms standard SOTA baselines.
## πŸš€ Live Demo
**Web Interface:** [https://odio-check.vercel.app/](https://odio-check.vercel.app/)
---
## πŸ—οΈ System Architecture
The project uses a **Hybrid Cloud** deployment to ensure high performance and scalability:
- **Frontend:** Hosted on **Vercel** for lightning-fast loading and smooth UI interactions.
- **Backend:** A **FastAPI** server running inside a **Docker** container on **Hugging Face Spaces**, providing the high RAM and CPU required for Pytorch model inference.
- **Model Storage:** Heavy `.pth` model weights (approx 800MB) are managed via **Git LFS** on Hugging Face to keep the source code repository lightweight.
---
## 🧠 Model Requirements Checklist
- [x] **Fully functioning code:** Complete end-to-end PyTorch implementation from dataset loading to real-time inference via a web UI.
- [x] **Baseline models (Γ—3):**
- **Wav2Vec2** β€” self-supervised transformer feature extractor (frozen) + attentive pooling classifier.
- **AASIST** β€” graph-based SOTA baseline using sinc-filter frontend + spectro-temporal heterogeneous graph attention.
- **CQCC Baseline** β€” standard CNN processing Constant-Q Cepstral Coefficients.
- [x] **SOTA Custom Model:** `ImprovedWav2Vec2CQCCDetector` β€” a novel fusion architecture combining Wav2Vec 2.0 and CQCC features via **bidirectional cross-attention**, followed by a **Graph Attention** backend.
- [x] **Ablation Study (Γ—4):** Four ablation variants systematically isolate each architectural component to validate the custom model design.
- [x] **Fully Working Frontend:** Glassmorphic UI served via FastAPI. Supports OGG/MP3/M4A/FLAC/WAV with real-time **temporal analysis timeline charts**.
- [x] **Cross-lingual Evaluation:** Trained on English audio, tested on unseen German audio (MLAAD-tiny) to evaluate out-of-distribution generalisation.
---
## πŸ› οΈ Local Installation & Setup
### 1. Install Dependencies
Ensure you have Python 3.9+ installed.
```bash
pip install -r requirements.txt
```
### 2. Dataset Download
Download the `MLAAD-tiny` dataset before training:
```bash
pip install -U "huggingface_hub[cli]"
huggingface-cli download mueller91/MLAAD-tiny --repo-type dataset --local-dir MLAAD-tiny
```
### 3. Training & Evaluation
To train all 4 primary models and 4 ablation variants:
```bash
python backend/train.py
```
*Weights will be saved to `backend/models/*.pth`.*
---
## πŸ’» Running the App Locally
### Method A: Connect to Production Backend (Default)
The frontend is configured to automatically detect if you are running on `localhost` and can be switched to point to your local backend in `frontend/script.js`.
### Method B: Run Local Backend
```bash
uvicorn backend.app:app --reload
```
Open **http://127.0.0.1:8000** to access the interface.
---
## πŸ“ Project Structure
```
AI Project/
β”œβ”€β”€ backend/
β”‚ β”œβ”€β”€ models.py # All architectures (3 baselines + custom + 4 ablations)
β”‚ β”œβ”€β”€ dataset.py # AudioDataset with CQCC caching & augmentation
β”‚ β”œβ”€β”€ train.py # Full training & evaluation pipeline
β”‚ β”œβ”€β”€ app.py # FastAPI inference server (temporal analysis logic)
β”‚ └── models/ # .pth weights (Stored via Git LFS on Hugging Face)
β”œβ”€β”€ frontend/
β”‚ β”œβ”€β”€ index.html # UI Shell
β”‚ β”œβ”€β”€ script.js # "Smart" URL switcher & visualization logic
β”‚ └── style.css # Glassmorphism design system
β”œβ”€β”€ Dockerfile # Production container config for Hugging Face
└── requirements.txt # Python dependencies
```
---