File size: 12,905 Bytes
22a6915
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
---
title: FocusGuard
emoji: 👁️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
short_description: Real-time webcam focus detection via MediaPipe + MLP/XGBoost
---

# FocusGuard

Real-time webcam-based visual attention estimation. MediaPipe Face Mesh extracts 17 features (EAR, gaze ratios, head pose, PERCLOS) per frame, selects 10, and routes them through MLP or XGBoost for binary focused/unfocused classification. Includes a local OpenCV demo and a full React + FastAPI web app with WebSocket/WebRTC video streaming.

![Real-time focus detection with face mesh and XGBoost classification](assets/focusguard-demo.gif)

---

## Team

**Team name:** FocusGuards (5CCSAGAP Large Group Project)

**Members:** Yingao Zheng, Mohamed Alketbi, Abdelrahman Almatrooshi, Junhao Zhou, Kexin Wang, Langyuan Huang, Saba Al-Gafri, Ayten Arab, Jaroslav Rakoto-Miklas

---

## Links

### Project access

- Git repository: [GAP_Large_project](https://github.kcl.ac.uk/k23172173/GAP_Large_project)
- Deployed app (Hugging Face): [FocusGuard/final_v2](https://huggingface.co/spaces/FocusGuard/final_v2)
- ClearML experiments: [FocusGuards Large Group Project](https://app.5ccsagap.er.kcl.ac.uk/projects/ce218b2f751641c68042f8fa216f8746/experiments)

### Data and checkpoints

- Checkpoints (Google Drive): [Download folder](https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link)
- Dataset (Google Drive): [Dataset folder](https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing)
- Data consent form (PDF): [Consent document](https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link)

The deployed app contains the full feature set (session history, L2CS calibration, model selector, achievements).

---

## Trained models

Model checkpoints are **not included** in the submission archive. Download them before running inference.

### Option 1: Hugging Face Space

Pre-trained checkpoints are available in the Hugging Face Space files:

```
https://huggingface.co/spaces/FocusGuard/final_v2/tree/main/checkpoints
```

Download and place into `checkpoints/`:

| File | Description |
|------|-------------|
| `mlp_best.pt` | PyTorch MLP (10-64-32-2, ~2,850 params) |
| `xgboost_face_orientation_best.json` | XGBoost (600 trees, depth 8, lr 0.1489) |
| `scaler_mlp.joblib` | StandardScaler fit on training data |
| `hybrid_focus_config.json` | Hybrid pipeline fusion weights |
| `hybrid_combiner.joblib` | Hybrid combiner |
| `L2CSNet_gaze360.pkl` | L2CS-Net ResNet50 gaze weights (96 MB) |

### Option 2: ClearML

Models are registered as ClearML OutputModels under project "FocusGuards Large Group Project".

| Model | Task ID | Model ID |
|-------|---------|----------|
| MLP | `3899b5aa0c3348b28213a3194322cdf7` | `56f94b799f624bdc845fa50c4d0606fe` |
| XGBoost | `c0ceb8e7e8194a51a7a31078cc47775c` | `6727b8de334f4ca0961c46b436f6fb7c` |

**UI:** Open a task on the [experiments page](https://app.5ccsagap.er.kcl.ac.uk/projects/ce218b2f751641c68042f8fa216f8746/experiments), go to Artifacts > Output Models, and download.

**Python:**

```python
from clearml import Model

mlp = Model(model_id="56f94b799f624bdc845fa50c4d0606fe")
mlp_path = mlp.get_local_copy()   # downloads .pt

xgb = Model(model_id="6727b8de334f4ca0961c46b436f6fb7c")
xgb_path = xgb.get_local_copy()   # downloads .json
```

Copy the downloaded files into `checkpoints/`.

### Option 3: Google Drive (submission fallback)

If ClearML access is restricted, download checkpoints from:
https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link

Place all files under `checkpoints/`.

### Option 4: Retrain from scratch

```bash
python -m models.mlp.train
python -m models.xgboost.train
```

This regenerates `checkpoints/mlp_best.pt`, `checkpoints/xgboost_face_orientation_best.json`, and scalers. Requires training data under `data/collected_*/`.

---

## Project layout

```
config/
    default.yaml              hyperparameters, thresholds, app settings
    __init__.py               config loader + ClearML flattener
    clearml_enrich.py         ClearML task enrichment + artifact upload
data_preparation/
    prepare_dataset.py        load/split/scale .npz files (pooled + LOPO)
    data_exploration.ipynb    EDA: distributions, class balance, correlations
models/
    face_mesh.py              MediaPipe 478-point face landmarks
    head_pose.py              yaw/pitch/roll via solvePnP, face-orientation score
    eye_scorer.py             EAR, MAR, gaze ratios, PERCLOS
    collect_features.py       real-time feature extraction + webcam labelling CLI
    gaze_calibration.py       9-point polynomial gaze calibration
    gaze_eye_fusion.py        fuses calibrated gaze with eye openness
    mlp/                      MLP training, eval, Optuna sweep
    xgboost/                  XGBoost training, eval, ClearML + Optuna sweeps
    L2CS-Net/                 vendored L2CS-Net (ResNet50, Gaze360)
checkpoints/                  (excluded from archive; see download instructions above)
notebooks/
    mlp.ipynb                 MLP training + LOPO in Jupyter
    xgboost.ipynb             XGBoost training + LOPO in Jupyter
evaluation/
    justify_thresholds.py     LOPO threshold + weight grid search
    feature_importance.py     XGBoost gain + leave-one-feature-out ablation
    grouped_split_benchmark.py  pooled vs LOPO comparison
    plots/                    ROC curves, confusion matrices, weight searches
    logs/                     JSON training logs
tests/
    test_*.py                 unit + integration tests (pytest)
    .coveragerc               coverage config
ui/
    pipeline.py               all 5 pipeline classes + output smoothing
    live_demo.py              OpenCV webcam demo
src/                          React (Vite) frontend source
static/                       built frontend assets (after npm build)
main.py                       FastAPI application entry point
package.json                  frontend package manifest
requirements.txt
pytest.ini
```

---

## Setup

Recommended versions:

- Python 3.10-3.11
- Node.js 18+ (needed only for frontend rebuild/dev)

```bash
python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate
pip install -r requirements.txt
```

Then download checkpoints (see above).

If you need to rebuild frontend assets locally:

```bash
npm install
npm run build
mkdir -p static && cp -r dist/* static/
```

---

## Run

### Local OpenCV demo

```bash
python ui/live_demo.py        
python ui/live_demo.py --xgb    # XGBoost
```

Controls: `m` cycle mesh overlay, `1-5` switch pipeline mode, `q` quit.

### Web app (without Docker)

```bash
source venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 7860
```

Open http://localhost:7860


### Web app (Docker)

```bash
docker-compose up               # serves on port 7860
```

---

## Data collection

```bash
python -m models.collect_features --name <participant>
```

Records webcam sessions with real-time binary labelling (spacebar toggles focused/unfocused). Saves per-frame feature vectors to `data/collected_<participant>/` as `.npz` files. Raw video is never stored.

9 participants recorded 5-10 min sessions across varied environments (144,793 frames total, 61.5% focused / 38.5% unfocused). All participants provided informed consent. Dataset files are not included in this repository.

Consent document: https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link
Raw participant dataset is excluded from this submission (coursework policy and privacy constraints). It can be shared with module staff on request: https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing

---

## Pipeline

```
Webcam frame
  --> MediaPipe Face Mesh (478 landmarks)
    --> Head pose (solvePnP): yaw, pitch, roll, s_face, head_deviation
    --> Eye scorer: EAR_left, EAR_right, EAR_avg, s_eye, MAR
    --> Gaze ratios: h_gaze, v_gaze, gaze_offset
    --> Temporal tracker: PERCLOS, blink_rate, closure_dur, yawn_dur
  --> 17 features --> select 10 --> clip to physiological bounds
  --> ML model (MLP / XGBoost) or geometric scorer
  --> Asymmetric EMA smoothing (alpha_up=0.55, alpha_down=0.45)
  --> FOCUSED / UNFOCUSED
```

Five runtime modes share the same feature extraction backbone:

| Mode | Description |
|------|-------------|
| **Geometric** | Deterministic scoring: 0.7 * s_face + 0.3 * s_eye, cosine-decay with max_angle=22 deg |
| **XGBoost** | 600-tree gradient-boosted ensemble, threshold 0.28 (LOPO-optimal) |
| **MLP** | PyTorch 10-64-32-2 perceptron, threshold 0.23 (LOPO-optimal) |
| **Hybrid** | 30% MLP + 70% geometric ensemble (LOPO F1 = 0.841) |
| **L2CS** | Deep gaze estimation via L2CS-Net (ResNet50, Gaze360 pretrained) |

Any mode can be combined with L2CS Boost mode (35% base + 65% L2CS, fused threshold 0.52). Off-screen gaze produces near-zero L2CS score via cosine decay, acting as a soft veto.

---

## Training

Both scripts read all hyperparameters from `config/default.yaml`.

```bash
python -m models.mlp.train
python -m models.xgboost.train
```

Outputs: `checkpoints/` (model + scaler) and `evaluation/logs/` (CSVs, JSON summaries).

### ClearML experiment tracking

```bash
USE_CLEARML=1 python -m models.mlp.train
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train
USE_CLEARML=1 python -m evaluation.justify_thresholds --clearml
```

Logs hyperparameters, per-epoch scalars, confusion matrices, ROC curves, model registration, dataset stats, and reproducibility artifacts (config YAML, requirements.txt, git SHA).

Reference experiment IDs:

| Model | ClearML experiment ID |
|-------|------------------------|
| MLP (`models.mlp.train`) | `3899b5aa0c3348b28213a3194322cdf7` |
| XGBoost (`models.xgboost.train`) | `c0ceb8e7e8194a51a7a31078cc47775c` |

---

## Evaluation

```bash
python -m evaluation.justify_thresholds          # LOPO threshold + weight search
python -m evaluation.grouped_split_benchmark     # pooled vs LOPO comparison
python -m evaluation.feature_importance          # XGBoost gain + LOFO ablation
```

### Results (pooled random split, 15% test)

| Model | Accuracy | F1 | ROC-AUC |
|-------|----------|----|---------|
| XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 |
| MLP (64-32) | 92.92% | 0.929 | 0.971 |

### Results (LOPO, 9 participants)

| Model | LOPO AUC | Best threshold (Youden's J) | F1 at best threshold |
|-------|----------|-----------------------------|----------------------|
| MLP | 0.862 | 0.228 | 0.858 |
| XGBoost | 0.870 | 0.280 | 0.855 |

Best geometric face weight (alpha) = 0.7 (mean LOPO F1 = 0.820).
Best hybrid MLP weight (w_mlp) = 0.3 (mean LOPO F1 = 0.841).

The ~12 pp drop from pooled to LOPO reflects temporal data leakage and confirms LOPO as the primary generalisation metric.

### Feature ablation

| Channel subset | Mean LOPO F1 |
|----------------|-------------|
| All 10 features | 0.829 |
| Eye state only | 0.807 |
| Head pose only | 0.748 |
| Gaze only | 0.726 |

Top-5 XGBoost gain: `s_face` (10.27), `ear_right` (9.54), `head_deviation` (8.83), `ear_avg` (6.96), `perclos` (5.68).

---

## L2CS Gaze Tracking

L2CS-Net predicts where your eyes are looking, not just where your head is pointed, catching the scenario where the head faces the screen but eyes wander.

**Standalone mode:** Select L2CS as the model.

**Boost mode:** Select any other model, then enable the GAZE toggle. L2CS runs alongside the base model with score-level fusion (35% base / 65% L2CS). Off-screen gaze triggers a soft veto.

**Calibration:** Click Calibrate during a session. A fullscreen overlay shows 9 target dots (3x3 grid). After all 9 points, a degree-2 polynomial maps gaze angles to screen coordinates with IQR outlier filtering and centre-point bias correction.

L2CS weight lookup order in runtime:

1. `checkpoints/L2CSNet_gaze360.pkl`
2. `models/L2CS-Net/models/L2CSNet_gaze360.pkl`
3. `models/L2CSNet_gaze360.pkl`

---

## Config

All hyperparameters and app settings are in `config/default.yaml`. Override with `FOCUSGUARD_CONFIG=/path/to/custom.yaml`.

---

## Tests

Included checks:

- data prep helpers and real split consistency (`test_data_preparation.py`; split test **skips** if `data/collected_*/*.npz` is absent)
- feature clipping (`test_models_clip_features.py`)
- pipeline integration (`test_pipeline_integration.py`)
- gaze calibration / fusion diagnostics (`test_gaze_pipeline.py`)
- FastAPI health, settings, sessions (`test_health_endpoint.py`, `test_api_settings.py`, `test_api_sessions.py`)

```bash
pytest
```

Coverage is enabled by default via `pytest.ini` (`--cov` / term report). For HTML coverage: `pytest --cov-report=html`.

**Stack:** Python, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net, FastAPI, React/Vite, SQLite, Docker, ClearML, pytest.