Spaces:

FocusGuard
/

test_final

Sleeping

File size: 7,838 Bytes

---
title: Focus Guard Final v2
emoji: 🎯
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
short_description: "Focus detection — MediaPipe, MLP/XGB, L2CS, FastAPI"
---

# FocusGuard

Webcam-based focus detection: MediaPipe face mesh → 17 features (EAR, gaze, head pose, PERCLOS, etc.) → MLP or XGBoost for focused/unfocused. React + FastAPI app with WebSocket video.

**Repository:** [KCL GAP project](https://github.kcl.ac.uk) (internal) — adjust link if you publish a public mirror.

## Project layout

```
├── data/                 collected_<name>/*.npz
├── data_preparation/     loaders, split, scale
├── notebooks/            MLP/XGB training + LOPO
├── models/               face_mesh, head_pose, eye_scorer, train scripts
│   ├── gaze_calibration.py   9-point polynomial gaze calibration
│   ├── gaze_eye_fusion.py    Fuses calibrated gaze with eye openness
│   └── L2CS-Net/              In-tree L2CS-Net repo with Gaze360 weights
├── checkpoints/          mlp_best.pt, xgboost_*_best.json, scalers
├── evaluation/           logs, plots, justify_thresholds
├── ui/                   pipeline.py, live_demo.py
├── src/                  React frontend
│   ├── components/
│   │   ├── FocusPageLocal.jsx      Main focus page (camera, controls, model selector)
│   │   └── CalibrationOverlay.jsx  Fullscreen calibration UI
│   └── utils/
│       └── VideoManagerLocal.js    WebSocket client, frame capture, canvas rendering
├── static/               built frontend (after npm run build)
├── main.py, app.py       FastAPI backend
├── requirements.txt
└── package.json
```

## Config

Hyperparameters and app settings live in `config/default.yaml` (learning rates, batch size, thresholds, L2CS weights, etc.). Override with env `FOCUSGUARD_CONFIG` pointing to another YAML.

## Setup

```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

To rebuild the frontend after changes:

```bash
npm install
npm run build
mkdir -p static && cp -r dist/* static/
```

## Run

**Web app:** Use the venv and run uvicorn via Python so it picks up your deps (otherwise you get `ModuleNotFoundError: aiosqlite`):

```bash
source venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 7860
```

Then open http://localhost:7860.

**Frontend dev server (optional, for React development):**

```bash
npm run dev
```

**OpenCV demo:**

```bash
python ui/live_demo.py
python ui/live_demo.py --xgb
```

**Train:**

```bash
python -m models.mlp.train
python -m models.xgboost.train
```

### ClearML experiment tracking

All training and evaluation config (from `config/default.yaml`) is exposed as ClearML task parameters. Enable logging with `USE_CLEARML=1`; optionally run on a **remote GPU agent** instead of locally:

```bash
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.mlp.train
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m evaluation.justify_thresholds --clearml
```

The script enqueues the task and exits; a `clearml-agent` listening on the named queue (e.g. `gpu`) runs the same command with the same parameters. Start an agent with:

```bash
clearml-agent daemon --queue gpu
```

Logged to ClearML: **parameters** (full flattened config), **scalars** (loss, accuracy, F1, ROC-AUC, per-class precision/recall/F1, dataset sizes and class counts), **artifacts** (best checkpoint, training log JSON), and **plots** (confusion matrix, ROC curves in evaluation).

## Data

9 participants, 144,793 samples, 10 features, binary labels. Collect with `python -m models.collect_features --name <name>`. Data lives in `data/collected_<name>/`.

**Train/val/test split:** All pooled training and evaluation use the same split for reproducibility. The test set is held out before any preprocessing; `StandardScaler` is fit on the training set only, then applied to val and test. Split ratios and random seed come from `config/default.yaml` (`data.split_ratios`, `mlp.seed`) via `data_preparation.prepare_dataset.get_default_split_config()`. MLP train, XGBoost train, eval_accuracy scripts, and benchmarks all use this single source so reported test accuracy is on the same held-out set.

## Models

| Model | What it uses | Best for |
|-------|-------------|----------|
| **Geometric** | Head pose angles + eye aspect ratio (EAR) | Fast, no ML needed |
| **XGBoost** | Trained classifier on head/eye features (600 trees, depth 8) | Balanced accuracy/speed |
| **MLP** | Neural network on same features (64→32) | Higher accuracy |
| **Hybrid** | Weighted MLP + Geometric ensemble | Best head-pose accuracy |
| **L2CS** | Deep gaze estimation (ResNet50, Gaze360 weights) | Detects eye-only gaze shifts |

## Model numbers (15% test split)

| Model | Accuracy | F1 | ROC-AUC |
|-------|----------|-----|---------|
| XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 |
| MLP (64→32) | 92.92% | 0.929 | 0.971 |

## Model numbers (LOPO, 9 participants)

| Model | LOPO AUC | Best threshold (Youden's J) | F1 @ best threshold | F1 @ 0.50 |
|-------|----------|------------------------------|---------------------|------------|
| MLP | 0.8624 | 0.228 | 0.8578 | 0.8149 |
| XGBoost | 0.8695 | 0.280 | 0.8549 | 0.8324 |

From the latest `python -m evaluation.justify_thresholds` run:

- Best geometric face weight (`alpha`) = `0.7` (mean LOPO F1 = `0.8195`)
- Best hybrid MLP weight (`w_mlp`) = `0.3` (mean LOPO F1 = `0.8409`)

## Grouped vs pooled benchmark

Latest quick benchmark (`python -m evaluation.grouped_split_benchmark --quick`) shows the expected gap between pooled random split and person-held-out LOPO:

| Protocol | Accuracy | F1 (weighted) | ROC-AUC |
|----------|---------:|--------------:|--------:|
| Pooled random split | 0.9510 | 0.9507 | 0.9869 |
| Grouped LOPO (9 folds) | 0.8303 | 0.8304 | 0.8801 |

This is why LOPO is the primary generalisation metric for reporting.

## Feature ablation snapshot

Latest quick feature-selection run (`python -m evaluation.feature_importance --quick --skip-lofo`):

| Subset | Mean LOPO F1 |
|--------|--------------|
| all_10 | 0.8286 |
| eye_state | 0.8071 |
| head_pose | 0.7480 |
| gaze | 0.7260 |

Top-5 XGBoost gain features: `s_face`, `ear_right`, `head_deviation`, `ear_avg`, `perclos`.
For full leave-one-feature-out ablation, run `python -m evaluation.feature_importance` (slower).

## L2CS Gaze Tracking

L2CS-Net predicts where your eyes are looking, not just where your head is pointed. This catches the scenario where your head faces the screen but your eyes wander.

### Standalone mode
Select **L2CS** as the model — it handles everything.

### Boost mode
Select any other model, then click the **GAZE** toggle. L2CS runs alongside the base model:

- Base model handles head pose and eye openness (35% weight)
- L2CS handles gaze direction (65% weight)
- If L2CS detects gaze is clearly off-screen, it **vetoes** the base model regardless of score

### Calibration
After enabling L2CS or Gaze Boost, click **Calibrate** while a session is running:

1. A fullscreen overlay shows 9 target dots (3×3 grid)
2. Look at each dot as the progress ring fills
3. The first dot (centre) sets your baseline gaze offset
4. After all 9 points, a polynomial model maps your gaze angles to screen coordinates
5. A cyan tracking dot appears on the video showing where you're looking

## Pipeline

1. Face mesh (MediaPipe 478 pts)
2. Head pose → yaw, pitch, roll, scores, gaze offset
3. Eye scorer → EAR, gaze ratio, MAR
4. Temporal → PERCLOS, blink rate, yawn
5. 10-d vector → MLP or XGBoost → focused / unfocused

**Stack:** FastAPI, aiosqlite, React/Vite, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net.