JunSiang26 commited on
Commit
2978fce
Β·
1 Parent(s): 409c94a

Update README with live demo link

Browse files
Files changed (1) hide show
  1. README.md +52 -69
README.md CHANGED
@@ -12,106 +12,89 @@ pinned: false
12
  # OdioCheck - Deepfake Voice Detection AI
13
  *50.021 Artificial Intelligence Project*
14
 
15
- ## Theme
16
- **AI for Security & Social Good** (UN SDG #16: Peace, Justice, and Strong Institutions)
17
- OdioCheck tackles the rising threat of audio deepfakes used in scams and misdirection.
18
 
19
- ## Requirements Checklist
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  - [x] **Fully functioning code:** Complete end-to-end PyTorch implementation from dataset loading to real-time inference via a web UI.
21
  - [x] **Baseline models (Γ—3):**
22
- - **Wav2Vec2** β€” self-supervised transformer feature extractor (frozen) + attentive pooling classifier (`backend/models.py`)
23
- - **AASIST** β€” graph-based SOTA baseline using sinc-filter frontend + spectro-temporal heterogeneous graph attention (`backend/models.py`)
24
- - **CQCC Baseline** β€” standard CNN processing Constant-Q Cepstral Coefficients (`backend/models.py`)
25
- - [x] **SOTA Custom Model:** `ImprovedWav2Vec2CQCCDetector` β€” a novel fusion architecture combining Wav2Vec 2.0 and CQCC features via **bidirectional cross-attention**, followed by a **Graph Attention** backend (`backend/models.py`).
26
- - [x] **Ablation Study (Γ—4):** Four ablation variants systematically isolate each architectural component to validate the custom model design:
27
- - **Ablation 1** β€” Wav2Vec2 + Graph (no CQCC, no cross-attention)
28
- - **Ablation 2** β€” CQCC + Graph (no Wav2Vec2, no cross-attention)
29
- - **Ablation 3** β€” Wav2Vec2 + CQCC + Simple Concat + Graph (no cross-attention)
30
- - **Ablation 4** β€” Wav2Vec2 + CQCC + Cross-Attention + Linear (no Graph Attention)
31
- - [x] **Fully Working Frontend:** Glassmorphic UI (Tailwind + Vanilla JS) served via FastAPI. Supports OGG/MP3/M4A/FLAC/WAV. Shows **side-by-side** predictions from all four primary models with real-time animated confidence bars and a per-window **temporal analysis timeline chart**.
32
- - [x] **Cross-lingual Dataset Split:** Trained on English audio (`MLAAD-tiny/en`), tested on unseen German audio (`MLAAD-tiny/de`) for out-of-distribution generalisation evaluation.
33
- - [x] **CQCC Feature Caching:** Pre-computed CQCC tensors are cached to disk to avoid redundant computation across training runs.
34
 
35
  ---
36
 
37
- ## Installation
38
 
39
- Ensure you have Python 3.9+ installed. Install all dependencies:
 
40
  ```bash
41
  pip install -r requirements.txt
42
  ```
43
 
44
- ### Dataset Download
45
- We use the `MLAAD-tiny` dataset (multi-language audio deepfakes). Download it from Hugging Face before training:
46
  ```bash
47
  pip install -U "huggingface_hub[cli]"
48
  huggingface-cli download mueller91/MLAAD-tiny --repo-type dataset --local-dir MLAAD-tiny
49
  ```
50
 
51
- ---
52
-
53
- ## Running the Project
54
-
55
- ### Step 1 β€” (Optional) Pre-compute CQCC Cache
56
- Pre-computing CQCC features once dramatically speeds up all subsequent training runs:
57
- ```bash
58
- python backend/train.py --precompute-cqcc-only
59
- ```
60
-
61
- ### Step 2 β€” Train All Models
62
- Trains all 4 primary models + 4 ablation variants, evaluates on the German test set, and saves `.pth` weights to `backend/models/`:
63
  ```bash
64
  python backend/train.py
65
  ```
 
66
 
67
- #### Available Training Flags
68
- | Flag | Default | Description |
69
- |---|---|---|
70
- | `--val-split F` | `0.2` | Fraction of English data reserved for validation (0–0.5). |
71
- | `--data-dir PATH` | auto | Override dataset root (must contain `original/` and `fake/` folders). |
72
- | `--cqcc-cache-dir PATH` | `backend/precomputed_features/cqcc` | Where to read/write cached CQCC tensors. |
73
- | `--precompute-cqcc-only` | `False` | Build CQCC cache and exit without training. |
74
- | `--force-rebuild-cqcc` | `False` | Recompute CQCC cache even if files already exist. |
75
- | `--smoke-test` | `False` | Run one forward pass through every model and exit β€” useful for verifying setup. |
76
-
77
- #### Quick Smoke Test
78
- Verify all models initialise and run a forward pass correctly without full training:
79
- ```bash
80
- python backend/train.py --smoke-test
81
- ```
82
 
83
- ### Step 3 β€” Start the Web Interface
84
  ```bash
85
  uvicorn backend.app:app --reload
86
  ```
87
- Open **http://127.0.0.1:8000** in your browser. Upload any audio file (WAV, MP3, OGG, FLAC, M4A) to see simultaneous predictions from all four primary models plus an animated temporal confidence chart.
88
 
89
  ---
90
 
91
- ## Project Architecture
92
-
93
  ```
94
  AI Project/
95
  β”œβ”€β”€ backend/
96
- β”‚ β”œβ”€β”€ models.py # All model architectures (3 baselines + custom + 4 ablations)
97
- β”‚ β”œβ”€β”€ dataset.py # AudioDataset with CQCC caching + data augmentation
98
- β”‚ β”œβ”€β”€ train.py # Full training + evaluation pipeline (CLI-driven)
99
- β”‚ β”œβ”€β”€ app.py # FastAPI inference server (windowed temporal analysis)
100
- β”‚ β”œβ”€β”€ preprocess.py # Standalone preprocessing utilities
101
- β”‚ └── models/ # Saved .pth weight files (generated after training)
102
  β”œβ”€β”€ frontend/
103
- β”‚ β”œβ”€β”€ index.html # Glassmorphic UI shell
104
- β”‚ β”œβ”€β”€ script.js # File upload, Chart.js timeline, model panel rendering
105
- β”‚ └── style.css # Custom glassmorphism styles
106
- β”œβ”€β”€ MLAAD-tiny/ # Dataset (downloaded separately)
107
- β”œβ”€β”€ requirements.txt # Python dependencies
108
- └── colab_training_notebook.ipynb # Google Colab training notebook
109
  ```
110
 
111
  ---
112
-
113
- ## Working with Other Datasets
114
- To replace MLAAD-tiny with another dataset (e.g., ASVspoof):
115
- 1. Place your `fake/` and `original/` (or `real/`) audio folders into a `data/` directory at the project root.
116
- 2. The `AudioDataset` in `dataset.py` auto-detects and falls back to the `data/` directory if `MLAAD-tiny` is absent.
117
- 3. Re-run `python backend/train.py`. The full pipeline runs identically.
 
12
  # OdioCheck - Deepfake Voice Detection AI
13
  *50.021 Artificial Intelligence Project*
14
 
15
+ [![Live Demo](https://img.shields.io/badge/Live%20Demo-Vercel-brightgreen?style=for-the-badge&logo=vercel)](https://odio-check.vercel.app/)
16
+ [![Backend](https://img.shields.io/badge/Backend-Hugging%20Face-yellow?style=for-the-badge&logo=huggingface)](https://huggingface.co/spaces/JunSiang26/OdioCheck-Backend)
 
17
 
18
+ OdioCheck is a cutting-edge deepfake audio detection system designed to tackle the rising threat of voice clones used in scams and misinformation. It features a unique hybrid fusion architecture that outperforms standard SOTA baselines.
19
+
20
+ ## πŸš€ Live Demo
21
+ **Web Interface:** [https://odio-check.vercel.app/](https://odio-check.vercel.app/)
22
+
23
+ ---
24
+
25
+ ## πŸ—οΈ System Architecture
26
+ The project uses a **Hybrid Cloud** deployment to ensure high performance and scalability:
27
+ - **Frontend:** Hosted on **Vercel** for lightning-fast loading and smooth UI interactions.
28
+ - **Backend:** A **FastAPI** server running inside a **Docker** container on **Hugging Face Spaces**, providing the high RAM and CPU required for Pytorch model inference.
29
+ - **Model Storage:** Heavy `.pth` model weights (approx 800MB) are managed via **Git LFS** on Hugging Face to keep the source code repository lightweight.
30
+
31
+ ---
32
+
33
+ ## 🧠 Model Requirements Checklist
34
  - [x] **Fully functioning code:** Complete end-to-end PyTorch implementation from dataset loading to real-time inference via a web UI.
35
  - [x] **Baseline models (Γ—3):**
36
+ - **Wav2Vec2** β€” self-supervised transformer feature extractor (frozen) + attentive pooling classifier.
37
+ - **AASIST** β€” graph-based SOTA baseline using sinc-filter frontend + spectro-temporal heterogeneous graph attention.
38
+ - **CQCC Baseline** β€” standard CNN processing Constant-Q Cepstral Coefficients.
39
+ - [x] **SOTA Custom Model:** `ImprovedWav2Vec2CQCCDetector` β€” a novel fusion architecture combining Wav2Vec 2.0 and CQCC features via **bidirectional cross-attention**, followed by a **Graph Attention** backend.
40
+ - [x] **Ablation Study (Γ—4):** Four ablation variants systematically isolate each architectural component to validate the custom model design.
41
+ - [x] **Fully Working Frontend:** Glassmorphic UI served via FastAPI. Supports OGG/MP3/M4A/FLAC/WAV with real-time **temporal analysis timeline charts**.
42
+ - [x] **Cross-lingual Evaluation:** Trained on English audio, tested on unseen German audio (MLAAD-tiny) to evaluate out-of-distribution generalisation.
 
 
 
 
 
43
 
44
  ---
45
 
46
+ ## πŸ› οΈ Local Installation & Setup
47
 
48
+ ### 1. Install Dependencies
49
+ Ensure you have Python 3.9+ installed.
50
  ```bash
51
  pip install -r requirements.txt
52
  ```
53
 
54
+ ### 2. Dataset Download
55
+ Download the `MLAAD-tiny` dataset before training:
56
  ```bash
57
  pip install -U "huggingface_hub[cli]"
58
  huggingface-cli download mueller91/MLAAD-tiny --repo-type dataset --local-dir MLAAD-tiny
59
  ```
60
 
61
+ ### 3. Training & Evaluation
62
+ To train all 4 primary models and 4 ablation variants:
 
 
 
 
 
 
 
 
 
 
63
  ```bash
64
  python backend/train.py
65
  ```
66
+ *Weights will be saved to `backend/models/*.pth`.*
67
 
68
+ ---
69
+
70
+ ## πŸ’» Running the App Locally
71
+
72
+ ### Method A: Connect to Production Backend (Default)
73
+ The frontend is configured to automatically detect if you are running on `localhost` and can be switched to point to your local backend in `frontend/script.js`.
 
 
 
 
 
 
 
 
 
74
 
75
+ ### Method B: Run Local Backend
76
  ```bash
77
  uvicorn backend.app:app --reload
78
  ```
79
+ Open **http://127.0.0.1:8000** to access the interface.
80
 
81
  ---
82
 
83
+ ## πŸ“ Project Structure
 
84
  ```
85
  AI Project/
86
  β”œβ”€β”€ backend/
87
+ β”‚ β”œβ”€β”€ models.py # All architectures (3 baselines + custom + 4 ablations)
88
+ β”‚ β”œβ”€β”€ dataset.py # AudioDataset with CQCC caching & augmentation
89
+ β”‚ β”œβ”€β”€ train.py # Full training & evaluation pipeline
90
+ β”‚ β”œβ”€β”€ app.py # FastAPI inference server (temporal analysis logic)
91
+ β”‚ └── models/ # .pth weights (Stored via Git LFS on Hugging Face)
 
92
  β”œβ”€β”€ frontend/
93
+ β”‚ β”œβ”€β”€ index.html # UI Shell
94
+ β”‚ β”œβ”€β”€ script.js # "Smart" URL switcher & visualization logic
95
+ β”‚ └── style.css # Glassmorphism design system
96
+ β”œβ”€β”€ Dockerfile # Production container config for Hugging Face
97
+ └── requirements.txt # Python dependencies
 
98
  ```
99
 
100
  ---