OdioCheck-Backend / README.md
JunSiang26's picture
Update README with live demo link
2978fce
metadata
title: OdioCheck-Backend
emoji: πŸŽ™οΈ
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
odiocheck

OdioCheck - Deepfake Voice Detection AI

50.021 Artificial Intelligence Project

Live Demo Backend

OdioCheck is a cutting-edge deepfake audio detection system designed to tackle the rising threat of voice clones used in scams and misinformation. It features a unique hybrid fusion architecture that outperforms standard SOTA baselines.

πŸš€ Live Demo

Web Interface: https://odio-check.vercel.app/


πŸ—οΈ System Architecture

The project uses a Hybrid Cloud deployment to ensure high performance and scalability:

  • Frontend: Hosted on Vercel for lightning-fast loading and smooth UI interactions.
  • Backend: A FastAPI server running inside a Docker container on Hugging Face Spaces, providing the high RAM and CPU required for Pytorch model inference.
  • Model Storage: Heavy .pth model weights (approx 800MB) are managed via Git LFS on Hugging Face to keep the source code repository lightweight.

🧠 Model Requirements Checklist

  • Fully functioning code: Complete end-to-end PyTorch implementation from dataset loading to real-time inference via a web UI.
  • Baseline models (Γ—3):
    • Wav2Vec2 β€” self-supervised transformer feature extractor (frozen) + attentive pooling classifier.
    • AASIST β€” graph-based SOTA baseline using sinc-filter frontend + spectro-temporal heterogeneous graph attention.
    • CQCC Baseline β€” standard CNN processing Constant-Q Cepstral Coefficients.
  • SOTA Custom Model: ImprovedWav2Vec2CQCCDetector β€” a novel fusion architecture combining Wav2Vec 2.0 and CQCC features via bidirectional cross-attention, followed by a Graph Attention backend.
  • Ablation Study (Γ—4): Four ablation variants systematically isolate each architectural component to validate the custom model design.
  • Fully Working Frontend: Glassmorphic UI served via FastAPI. Supports OGG/MP3/M4A/FLAC/WAV with real-time temporal analysis timeline charts.
  • Cross-lingual Evaluation: Trained on English audio, tested on unseen German audio (MLAAD-tiny) to evaluate out-of-distribution generalisation.

πŸ› οΈ Local Installation & Setup

1. Install Dependencies

Ensure you have Python 3.9+ installed.

pip install -r requirements.txt

2. Dataset Download

Download the MLAAD-tiny dataset before training:

pip install -U "huggingface_hub[cli]"
huggingface-cli download mueller91/MLAAD-tiny --repo-type dataset --local-dir MLAAD-tiny

3. Training & Evaluation

To train all 4 primary models and 4 ablation variants:

python backend/train.py

Weights will be saved to backend/models/*.pth.


πŸ’» Running the App Locally

Method A: Connect to Production Backend (Default)

The frontend is configured to automatically detect if you are running on localhost and can be switched to point to your local backend in frontend/script.js.

Method B: Run Local Backend

uvicorn backend.app:app --reload

Open http://127.0.0.1:8000 to access the interface.


πŸ“ Project Structure

AI Project/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ models.py          # All architectures (3 baselines + custom + 4 ablations)
β”‚   β”œβ”€β”€ dataset.py         # AudioDataset with CQCC caching & augmentation
β”‚   β”œβ”€β”€ train.py           # Full training & evaluation pipeline
β”‚   β”œβ”€β”€ app.py             # FastAPI inference server (temporal analysis logic)
β”‚   └── models/            # .pth weights (Stored via Git LFS on Hugging Face)
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ index.html         # UI Shell
β”‚   β”œβ”€β”€ script.js          # "Smart" URL switcher & visualization logic
β”‚   └── style.css          # Glassmorphism design system
β”œβ”€β”€ Dockerfile             # Production container config for Hugging Face
└── requirements.txt       # Python dependencies