Spaces:
Running
Running
File size: 4,491 Bytes
b21e243 9f2b6db 2978fce 9f2b6db 2978fce 9f2b6db 2978fce 9f2b6db 2978fce 9f2b6db 2978fce 9f2b6db 2978fce 9f2b6db 2978fce 9f2b6db 2978fce 9f2b6db 2978fce 9f2b6db 2978fce 9f2b6db 2978fce 9f2b6db 2978fce 9f2b6db 2978fce 9f2b6db 2978fce 9f2b6db | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | ---
title: OdioCheck-Backend
emoji: ποΈ
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
---
<img width="1080" height="324" alt="odiocheck" src="https://github.com/user-attachments/assets/4d7b573e-5b0b-4fc7-85de-da60bbb701c2" />
# OdioCheck - Deepfake Voice Detection AI
*50.021 Artificial Intelligence Project*
[](https://odio-check.vercel.app/)
[](https://huggingface.co/spaces/JunSiang26/OdioCheck-Backend)
OdioCheck is a cutting-edge deepfake audio detection system designed to tackle the rising threat of voice clones used in scams and misinformation. It features a unique hybrid fusion architecture that outperforms standard SOTA baselines.
## π Live Demo
**Web Interface:** [https://odio-check.vercel.app/](https://odio-check.vercel.app/)
---
## ποΈ System Architecture
The project uses a **Hybrid Cloud** deployment to ensure high performance and scalability:
- **Frontend:** Hosted on **Vercel** for lightning-fast loading and smooth UI interactions.
- **Backend:** A **FastAPI** server running inside a **Docker** container on **Hugging Face Spaces**, providing the high RAM and CPU required for Pytorch model inference.
- **Model Storage:** Heavy `.pth` model weights (approx 800MB) are managed via **Git LFS** on Hugging Face to keep the source code repository lightweight.
---
## π§ Model Requirements Checklist
- [x] **Fully functioning code:** Complete end-to-end PyTorch implementation from dataset loading to real-time inference via a web UI.
- [x] **Baseline models (Γ3):**
- **Wav2Vec2** β self-supervised transformer feature extractor (frozen) + attentive pooling classifier.
- **AASIST** β graph-based SOTA baseline using sinc-filter frontend + spectro-temporal heterogeneous graph attention.
- **CQCC Baseline** β standard CNN processing Constant-Q Cepstral Coefficients.
- [x] **SOTA Custom Model:** `ImprovedWav2Vec2CQCCDetector` β a novel fusion architecture combining Wav2Vec 2.0 and CQCC features via **bidirectional cross-attention**, followed by a **Graph Attention** backend.
- [x] **Ablation Study (Γ4):** Four ablation variants systematically isolate each architectural component to validate the custom model design.
- [x] **Fully Working Frontend:** Glassmorphic UI served via FastAPI. Supports OGG/MP3/M4A/FLAC/WAV with real-time **temporal analysis timeline charts**.
- [x] **Cross-lingual Evaluation:** Trained on English audio, tested on unseen German audio (MLAAD-tiny) to evaluate out-of-distribution generalisation.
---
## π οΈ Local Installation & Setup
### 1. Install Dependencies
Ensure you have Python 3.9+ installed.
```bash
pip install -r requirements.txt
```
### 2. Dataset Download
Download the `MLAAD-tiny` dataset before training:
```bash
pip install -U "huggingface_hub[cli]"
huggingface-cli download mueller91/MLAAD-tiny --repo-type dataset --local-dir MLAAD-tiny
```
### 3. Training & Evaluation
To train all 4 primary models and 4 ablation variants:
```bash
python backend/train.py
```
*Weights will be saved to `backend/models/*.pth`.*
---
## π» Running the App Locally
### Method A: Connect to Production Backend (Default)
The frontend is configured to automatically detect if you are running on `localhost` and can be switched to point to your local backend in `frontend/script.js`.
### Method B: Run Local Backend
```bash
uvicorn backend.app:app --reload
```
Open **http://127.0.0.1:8000** to access the interface.
---
## π Project Structure
```
AI Project/
βββ backend/
β βββ models.py # All architectures (3 baselines + custom + 4 ablations)
β βββ dataset.py # AudioDataset with CQCC caching & augmentation
β βββ train.py # Full training & evaluation pipeline
β βββ app.py # FastAPI inference server (temporal analysis logic)
β βββ models/ # .pth weights (Stored via Git LFS on Hugging Face)
βββ frontend/
β βββ index.html # UI Shell
β βββ script.js # "Smart" URL switcher & visualization logic
β βββ style.css # Glassmorphism design system
βββ Dockerfile # Production container config for Hugging Face
βββ requirements.txt # Python dependencies
```
---
|