Spaces:

JunSiang26
/

OdioCheck-Backend

Running

App Files Files Community

JunSiang26 commited on Apr 16

Commit

2978fce

1 Parent(s): 409c94a

Update README with live demo link

Browse files

Files changed (1) hide show

README.md +52 -69

README.md CHANGED Viewed

@@ -12,106 +12,89 @@ pinned: false
 # OdioCheck - Deepfake Voice Detection AI
 *50.021 Artificial Intelligence Project*
-## Theme
-**AI for Security & Social Good** (UN SDG #16: Peace, Justice, and Strong Institutions)
-OdioCheck tackles the rising threat of audio deepfakes used in scams and misdirection.
-## Requirements Checklist
 - [x] **Fully functioning code:** Complete end-to-end PyTorch implementation from dataset loading to real-time inference via a web UI.
 - [x] **Baseline models (×3):**
-  - **Wav2Vec2** — self-supervised transformer feature extractor (frozen) + attentive pooling classifier (`backend/models.py`)
-  - **AASIST** — graph-based SOTA baseline using sinc-filter frontend + spectro-temporal heterogeneous graph attention (`backend/models.py`)
-  - **CQCC Baseline** — standard CNN processing Constant-Q Cepstral Coefficients (`backend/models.py`)
-- [x] **SOTA Custom Model:** `ImprovedWav2Vec2CQCCDetector` — a novel fusion architecture combining Wav2Vec 2.0 and CQCC features via **bidirectional cross-attention**, followed by a **Graph Attention** backend (`backend/models.py`).
-- [x] **Ablation Study (×4):** Four ablation variants systematically isolate each architectural component to validate the custom model design:
-  - **Ablation 1** — Wav2Vec2 + Graph (no CQCC, no cross-attention)
-  - **Ablation 2** — CQCC + Graph (no Wav2Vec2, no cross-attention)
-  - **Ablation 3** — Wav2Vec2 + CQCC + Simple Concat + Graph (no cross-attention)
-  - **Ablation 4** — Wav2Vec2 + CQCC + Cross-Attention + Linear (no Graph Attention)
-- [x] **Fully Working Frontend:** Glassmorphic UI (Tailwind + Vanilla JS) served via FastAPI. Supports OGG/MP3/M4A/FLAC/WAV. Shows **side-by-side** predictions from all four primary models with real-time animated confidence bars and a per-window **temporal analysis timeline chart**.
-- [x] **Cross-lingual Dataset Split:** Trained on English audio (`MLAAD-tiny/en`), tested on unseen German audio (`MLAAD-tiny/de`) for out-of-distribution generalisation evaluation.
-- [x] **CQCC Feature Caching:** Pre-computed CQCC tensors are cached to disk to avoid redundant computation across training runs.
 ---
-## Installation
-Ensure you have Python 3.9+ installed. Install all dependencies:
 ```bash
 pip install -r requirements.txt
 ```
-### Dataset Download
-We use the `MLAAD-tiny` dataset (multi-language audio deepfakes). Download it from Hugging Face before training:
 ```bash
 pip install -U "huggingface_hub[cli]"
 huggingface-cli download mueller91/MLAAD-tiny --repo-type dataset --local-dir MLAAD-tiny
 ```
----
-## Running the Project
-### Step 1 — (Optional) Pre-compute CQCC Cache
-Pre-computing CQCC features once dramatically speeds up all subsequent training runs:
-```bash
-python backend/train.py --precompute-cqcc-only
-```
-### Step 2 — Train All Models
-Trains all 4 primary models + 4 ablation variants, evaluates on the German test set, and saves `.pth` weights to `backend/models/`:
 ```bash
 python backend/train.py
 ```
-#### Available Training Flags
-| Flag | Default | Description |
-|---|---|---|
-| `--val-split F` | `0.2` | Fraction of English data reserved for validation (0–0.5). |
-| `--data-dir PATH` | auto | Override dataset root (must contain `original/` and `fake/` folders). |
-| `--cqcc-cache-dir PATH` | `backend/precomputed_features/cqcc` | Where to read/write cached CQCC tensors. |
-| `--precompute-cqcc-only` | `False` | Build CQCC cache and exit without training. |
-| `--force-rebuild-cqcc` | `False` | Recompute CQCC cache even if files already exist. |
-| `--smoke-test` | `False` | Run one forward pass through every model and exit — useful for verifying setup. |
-#### Quick Smoke Test
-Verify all models initialise and run a forward pass correctly without full training:
-```bash
-python backend/train.py --smoke-test
-```
-### Step 3 — Start the Web Interface
 ```bash
 uvicorn backend.app:app --reload
 ```
-Open **http://127.0.0.1:8000** in your browser. Upload any audio file (WAV, MP3, OGG, FLAC, M4A) to see simultaneous predictions from all four primary models plus an animated temporal confidence chart.
 ---
-## Project Architecture
 ```
 AI Project/
 ├── backend/
-│   ├── models.py          # All model architectures (3 baselines + custom + 4 ablations)
-│   ├── dataset.py         # AudioDataset with CQCC caching + data augmentation
-│   ├── train.py           # Full training + evaluation pipeline (CLI-driven)
-│   ├── app.py             # FastAPI inference server (windowed temporal analysis)
-│   ├── preprocess.py      # Standalone preprocessing utilities
-│   └── models/            # Saved .pth weight files (generated after training)
 ├── frontend/
-│   ├── index.html         # Glassmorphic UI shell
-│   ├── script.js          # File upload, Chart.js timeline, model panel rendering
-│   └── style.css          # Custom glassmorphism styles
-├── MLAAD-tiny/            # Dataset (downloaded separately)
-├── requirements.txt       # Python dependencies
-└── colab_training_notebook.ipynb  # Google Colab training notebook
 ```
 ---
-## Working with Other Datasets
-To replace MLAAD-tiny with another dataset (e.g., ASVspoof):
-1. Place your `fake/` and `original/` (or `real/`) audio folders into a `data/` directory at the project root.
-2. The `AudioDataset` in `dataset.py` auto-detects and falls back to the `data/` directory if `MLAAD-tiny` is absent.
-3. Re-run `python backend/train.py`. The full pipeline runs identically.

 # OdioCheck - Deepfake Voice Detection AI
 *50.021 Artificial Intelligence Project*
+[![Live Demo](https://img.shields.io/badge/Live%20Demo-Vercel-brightgreen?style=for-the-badge&logo=vercel)](https://odio-check.vercel.app/)
+[![Backend](https://img.shields.io/badge/Backend-Hugging%20Face-yellow?style=for-the-badge&logo=huggingface)](https://huggingface.co/spaces/JunSiang26/OdioCheck-Backend)
+OdioCheck is a cutting-edge deepfake audio detection system designed to tackle the rising threat of voice clones used in scams and misinformation. It features a unique hybrid fusion architecture that outperforms standard SOTA baselines.
+## 🚀 Live Demo
+**Web Interface:** [https://odio-check.vercel.app/](https://odio-check.vercel.app/)
+---
+## 🏗️ System Architecture
+The project uses a **Hybrid Cloud** deployment to ensure high performance and scalability:
+- **Frontend:** Hosted on **Vercel** for lightning-fast loading and smooth UI interactions.
+- **Backend:** A **FastAPI** server running inside a **Docker** container on **Hugging Face Spaces**, providing the high RAM and CPU required for Pytorch model inference.
+- **Model Storage:** Heavy `.pth` model weights (approx 800MB) are managed via **Git LFS** on Hugging Face to keep the source code repository lightweight.
+---
+## 🧠 Model Requirements Checklist
 - [x] **Fully functioning code:** Complete end-to-end PyTorch implementation from dataset loading to real-time inference via a web UI.
 - [x] **Baseline models (×3):**
+  - **Wav2Vec2** — self-supervised transformer feature extractor (frozen) + attentive pooling classifier.
+  - **AASIST** — graph-based SOTA baseline using sinc-filter frontend + spectro-temporal heterogeneous graph attention.
+  - **CQCC Baseline** — standard CNN processing Constant-Q Cepstral Coefficients.
+- [x] **SOTA Custom Model:** `ImprovedWav2Vec2CQCCDetector` — a novel fusion architecture combining Wav2Vec 2.0 and CQCC features via **bidirectional cross-attention**, followed by a **Graph Attention** backend.
+- [x] **Ablation Study (×4):** Four ablation variants systematically isolate each architectural component to validate the custom model design.
+- [x] **Fully Working Frontend:** Glassmorphic UI served via FastAPI. Supports OGG/MP3/M4A/FLAC/WAV with real-time **temporal analysis timeline charts**.
+- [x] **Cross-lingual Evaluation:** Trained on English audio, tested on unseen German audio (MLAAD-tiny) to evaluate out-of-distribution generalisation.
 ---
+## 🛠️ Local Installation & Setup
+### 1. Install Dependencies
+Ensure you have Python 3.9+ installed.
 ```bash
 pip install -r requirements.txt
 ```
+### 2. Dataset Download
+Download the `MLAAD-tiny` dataset before training:
 ```bash
 pip install -U "huggingface_hub[cli]"
 huggingface-cli download mueller91/MLAAD-tiny --repo-type dataset --local-dir MLAAD-tiny
 ```
+### 3. Training & Evaluation
+To train all 4 primary models and 4 ablation variants:
 ```bash
 python backend/train.py
 ```
+*Weights will be saved to `backend/models/*.pth`.*
+---
+## 💻 Running the App Locally
+### Method A: Connect to Production Backend (Default)
+The frontend is configured to automatically detect if you are running on `localhost` and can be switched to point to your local backend in `frontend/script.js`.
+### Method B: Run Local Backend
 ```bash
 uvicorn backend.app:app --reload
 ```
+Open **http://127.0.0.1:8000** to access the interface.
 ---
+## 📁 Project Structure
 ```
 AI Project/
 ├── backend/
+│   ├── models.py          # All architectures (3 baselines + custom + 4 ablations)
+│   ├── dataset.py         # AudioDataset with CQCC caching & augmentation
+│   ├── train.py           # Full training & evaluation pipeline
+│   ├── app.py             # FastAPI inference server (temporal analysis logic)
+│   └── models/            # .pth weights (Stored via Git LFS on Hugging Face)
 ├── frontend/
+│   ├── index.html         # UI Shell
+│   ├── script.js          # "Smart" URL switcher & visualization logic
+│   └── style.css          # Glassmorphism design system
+├── Dockerfile             # Production container config for Hugging Face
+└── requirements.txt       # Python dependencies
 ```
 ---