Spaces:
Running
title: OdioCheck-Backend
emoji: ποΈ
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
OdioCheck - Deepfake Voice Detection AI
50.021 Artificial Intelligence Project
OdioCheck is a cutting-edge deepfake audio detection system designed to tackle the rising threat of voice clones used in scams and misinformation. It features a unique hybrid fusion architecture that outperforms standard SOTA baselines.
π Live Demo
Web Interface: https://odio-check.vercel.app/
ποΈ System Architecture
The project uses a Hybrid Cloud deployment to ensure high performance and scalability:
- Frontend: Hosted on Vercel for lightning-fast loading and smooth UI interactions.
- Backend: A FastAPI server running inside a Docker container on Hugging Face Spaces, providing the high RAM and CPU required for Pytorch model inference.
- Model Storage: Heavy
.pthmodel weights (approx 800MB) are managed via Git LFS on Hugging Face to keep the source code repository lightweight.
π§ Model Requirements Checklist
- Fully functioning code: Complete end-to-end PyTorch implementation from dataset loading to real-time inference via a web UI.
- Baseline models (Γ3):
- Wav2Vec2 β self-supervised transformer feature extractor (frozen) + attentive pooling classifier.
- AASIST β graph-based SOTA baseline using sinc-filter frontend + spectro-temporal heterogeneous graph attention.
- CQCC Baseline β standard CNN processing Constant-Q Cepstral Coefficients.
- SOTA Custom Model:
ImprovedWav2Vec2CQCCDetectorβ a novel fusion architecture combining Wav2Vec 2.0 and CQCC features via bidirectional cross-attention, followed by a Graph Attention backend. - Ablation Study (Γ4): Four ablation variants systematically isolate each architectural component to validate the custom model design.
- Fully Working Frontend: Glassmorphic UI served via FastAPI. Supports OGG/MP3/M4A/FLAC/WAV with real-time temporal analysis timeline charts.
- Cross-lingual Evaluation: Trained on English audio, tested on unseen German audio (MLAAD-tiny) to evaluate out-of-distribution generalisation.
π οΈ Local Installation & Setup
1. Install Dependencies
Ensure you have Python 3.9+ installed.
pip install -r requirements.txt
2. Dataset Download
Download the MLAAD-tiny dataset before training:
pip install -U "huggingface_hub[cli]"
huggingface-cli download mueller91/MLAAD-tiny --repo-type dataset --local-dir MLAAD-tiny
3. Training & Evaluation
To train all 4 primary models and 4 ablation variants:
python backend/train.py
Weights will be saved to backend/models/*.pth.
π» Running the App Locally
Method A: Connect to Production Backend (Default)
The frontend is configured to automatically detect if you are running on localhost and can be switched to point to your local backend in frontend/script.js.
Method B: Run Local Backend
uvicorn backend.app:app --reload
Open http://127.0.0.1:8000 to access the interface.
π Project Structure
AI Project/
βββ backend/
β βββ models.py # All architectures (3 baselines + custom + 4 ablations)
β βββ dataset.py # AudioDataset with CQCC caching & augmentation
β βββ train.py # Full training & evaluation pipeline
β βββ app.py # FastAPI inference server (temporal analysis logic)
β βββ models/ # .pth weights (Stored via Git LFS on Hugging Face)
βββ frontend/
β βββ index.html # UI Shell
β βββ script.js # "Smart" URL switcher & visualization logic
β βββ style.css # Glassmorphism design system
βββ Dockerfile # Production container config for Hugging Face
βββ requirements.txt # Python dependencies