File size: 7,838 Bytes
2073485
e405722
 
 
 
2073485
 
e405722
2e034be
2073485
 
2fb4cf4
f6b961e
e405722
f6b961e
e405722
eb4abb8
f488769
f6b961e
d209694
f488769
 
 
 
d5b4f5f
 
 
f488769
 
 
 
d5b4f5f
 
 
 
 
f488769
 
 
 
d209694
 
eb4abb8
 
 
 
f488769
d209694
 
 
 
 
 
 
f488769
d209694
 
 
 
f488769
d209694
 
f488769
d209694
f488769
d209694
 
f488769
 
d209694
 
f488769
d209694
a6dbd37
 
 
 
 
 
f488769
d209694
 
 
f488769
d209694
 
f488769
d209694
 
f488769
 
d209694
 
eb4abb8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f488769
d209694
f488769
d209694
eb4abb8
 
d5b4f5f
 
 
 
 
 
e405722
d5b4f5f
 
 
f488769
d209694
f488769
 
 
e405722
d5b4f5f
eb4abb8
 
 
 
 
 
 
 
e405722
eb4abb8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d5b4f5f
 
 
 
 
e405722
d5b4f5f
 
 
e405722
d5b4f5f
 
 
 
 
 
e405722
 
d5b4f5f
 
 
 
d209694
f488769
d209694
f488769
e405722
 
 
 
d209694
d5b4f5f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
---
title: Focus Guard Final v2
emoji: 🎯
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
short_description: "Focus detection β€” MediaPipe, MLP/XGB, L2CS, FastAPI"
---

# FocusGuard

Webcam-based focus detection: MediaPipe face mesh β†’ 17 features (EAR, gaze, head pose, PERCLOS, etc.) β†’ MLP or XGBoost for focused/unfocused. React + FastAPI app with WebSocket video.

**Repository:** [KCL GAP project](https://github.kcl.ac.uk) (internal) β€” adjust link if you publish a public mirror.

## Project layout

```
β”œβ”€β”€ data/                 collected_<name>/*.npz
β”œβ”€β”€ data_preparation/     loaders, split, scale
β”œβ”€β”€ notebooks/            MLP/XGB training + LOPO
β”œβ”€β”€ models/               face_mesh, head_pose, eye_scorer, train scripts
β”‚   β”œβ”€β”€ gaze_calibration.py   9-point polynomial gaze calibration
β”‚   β”œβ”€β”€ gaze_eye_fusion.py    Fuses calibrated gaze with eye openness
β”‚   └── L2CS-Net/              In-tree L2CS-Net repo with Gaze360 weights
β”œβ”€β”€ checkpoints/          mlp_best.pt, xgboost_*_best.json, scalers
β”œβ”€β”€ evaluation/           logs, plots, justify_thresholds
β”œβ”€β”€ ui/                   pipeline.py, live_demo.py
β”œβ”€β”€ src/                  React frontend
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ FocusPageLocal.jsx      Main focus page (camera, controls, model selector)
β”‚   β”‚   └── CalibrationOverlay.jsx  Fullscreen calibration UI
β”‚   └── utils/
β”‚       └── VideoManagerLocal.js    WebSocket client, frame capture, canvas rendering
β”œβ”€β”€ static/               built frontend (after npm run build)
β”œβ”€β”€ main.py, app.py       FastAPI backend
β”œβ”€β”€ requirements.txt
└── package.json
```

## Config

Hyperparameters and app settings live in `config/default.yaml` (learning rates, batch size, thresholds, L2CS weights, etc.). Override with env `FOCUSGUARD_CONFIG` pointing to another YAML.

## Setup

```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

To rebuild the frontend after changes:

```bash
npm install
npm run build
mkdir -p static && cp -r dist/* static/
```

## Run

**Web app:** Use the venv and run uvicorn via Python so it picks up your deps (otherwise you get `ModuleNotFoundError: aiosqlite`):

```bash
source venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 7860
```

Then open http://localhost:7860.

**Frontend dev server (optional, for React development):**

```bash
npm run dev
```

**OpenCV demo:**

```bash
python ui/live_demo.py
python ui/live_demo.py --xgb
```

**Train:**

```bash
python -m models.mlp.train
python -m models.xgboost.train
```

### ClearML experiment tracking

All training and evaluation config (from `config/default.yaml`) is exposed as ClearML task parameters. Enable logging with `USE_CLEARML=1`; optionally run on a **remote GPU agent** instead of locally:

```bash
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.mlp.train
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m evaluation.justify_thresholds --clearml
```

The script enqueues the task and exits; a `clearml-agent` listening on the named queue (e.g. `gpu`) runs the same command with the same parameters. Start an agent with:

```bash
clearml-agent daemon --queue gpu
```

Logged to ClearML: **parameters** (full flattened config), **scalars** (loss, accuracy, F1, ROC-AUC, per-class precision/recall/F1, dataset sizes and class counts), **artifacts** (best checkpoint, training log JSON), and **plots** (confusion matrix, ROC curves in evaluation).

## Data

9 participants, 144,793 samples, 10 features, binary labels. Collect with `python -m models.collect_features --name <name>`. Data lives in `data/collected_<name>/`.

**Train/val/test split:** All pooled training and evaluation use the same split for reproducibility. The test set is held out before any preprocessing; `StandardScaler` is fit on the training set only, then applied to val and test. Split ratios and random seed come from `config/default.yaml` (`data.split_ratios`, `mlp.seed`) via `data_preparation.prepare_dataset.get_default_split_config()`. MLP train, XGBoost train, eval_accuracy scripts, and benchmarks all use this single source so reported test accuracy is on the same held-out set.

## Models

| Model | What it uses | Best for |
|-------|-------------|----------|
| **Geometric** | Head pose angles + eye aspect ratio (EAR) | Fast, no ML needed |
| **XGBoost** | Trained classifier on head/eye features (600 trees, depth 8) | Balanced accuracy/speed |
| **MLP** | Neural network on same features (64β†’32) | Higher accuracy |
| **Hybrid** | Weighted MLP + Geometric ensemble | Best head-pose accuracy |
| **L2CS** | Deep gaze estimation (ResNet50, Gaze360 weights) | Detects eye-only gaze shifts |

## Model numbers (15% test split)

| Model | Accuracy | F1 | ROC-AUC |
|-------|----------|-----|---------|
| XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 |
| MLP (64β†’32) | 92.92% | 0.929 | 0.971 |

## Model numbers (LOPO, 9 participants)

| Model | LOPO AUC | Best threshold (Youden's J) | F1 @ best threshold | F1 @ 0.50 |
|-------|----------|------------------------------|---------------------|------------|
| MLP | 0.8624 | 0.228 | 0.8578 | 0.8149 |
| XGBoost | 0.8695 | 0.280 | 0.8549 | 0.8324 |

From the latest `python -m evaluation.justify_thresholds` run:

- Best geometric face weight (`alpha`) = `0.7` (mean LOPO F1 = `0.8195`)
- Best hybrid MLP weight (`w_mlp`) = `0.3` (mean LOPO F1 = `0.8409`)

## Grouped vs pooled benchmark

Latest quick benchmark (`python -m evaluation.grouped_split_benchmark --quick`) shows the expected gap between pooled random split and person-held-out LOPO:

| Protocol | Accuracy | F1 (weighted) | ROC-AUC |
|----------|---------:|--------------:|--------:|
| Pooled random split | 0.9510 | 0.9507 | 0.9869 |
| Grouped LOPO (9 folds) | 0.8303 | 0.8304 | 0.8801 |

This is why LOPO is the primary generalisation metric for reporting.

## Feature ablation snapshot

Latest quick feature-selection run (`python -m evaluation.feature_importance --quick --skip-lofo`):

| Subset | Mean LOPO F1 |
|--------|--------------|
| all_10 | 0.8286 |
| eye_state | 0.8071 |
| head_pose | 0.7480 |
| gaze | 0.7260 |

Top-5 XGBoost gain features: `s_face`, `ear_right`, `head_deviation`, `ear_avg`, `perclos`.
For full leave-one-feature-out ablation, run `python -m evaluation.feature_importance` (slower).

## L2CS Gaze Tracking

L2CS-Net predicts where your eyes are looking, not just where your head is pointed. This catches the scenario where your head faces the screen but your eyes wander.

### Standalone mode
Select **L2CS** as the model β€” it handles everything.

### Boost mode
Select any other model, then click the **GAZE** toggle. L2CS runs alongside the base model:

- Base model handles head pose and eye openness (35% weight)
- L2CS handles gaze direction (65% weight)
- If L2CS detects gaze is clearly off-screen, it **vetoes** the base model regardless of score

### Calibration
After enabling L2CS or Gaze Boost, click **Calibrate** while a session is running:

1. A fullscreen overlay shows 9 target dots (3Γ—3 grid)
2. Look at each dot as the progress ring fills
3. The first dot (centre) sets your baseline gaze offset
4. After all 9 points, a polynomial model maps your gaze angles to screen coordinates
5. A cyan tracking dot appears on the video showing where you're looking

## Pipeline

1. Face mesh (MediaPipe 478 pts)
2. Head pose β†’ yaw, pitch, roll, scores, gaze offset
3. Eye scorer β†’ EAR, gaze ratio, MAR
4. Temporal β†’ PERCLOS, blink rate, yawn
5. 10-d vector β†’ MLP or XGBoost β†’ focused / unfocused

**Stack:** FastAPI, aiosqlite, React/Vite, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net.