Spaces:

FocusGuard
/

final_test

Sleeping

File size: 12,905 Bytes

22a6915

---
title: FocusGuard
emoji: 👁️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
short_description: Real-time webcam focus detection via MediaPipe + MLP/XGBoost
---

# FocusGuard

Real-time webcam-based visual attention estimation. MediaPipe Face Mesh extracts 17 features (EAR, gaze ratios, head pose, PERCLOS) per frame, selects 10, and routes them through MLP or XGBoost for binary focused/unfocused classification. Includes a local OpenCV demo and a full React + FastAPI web app with WebSocket/WebRTC video streaming.

![Real-time focus detection with face mesh and XGBoost classification](assets/focusguard-demo.gif)

---

## Team

**Team name:** FocusGuards (5CCSAGAP Large Group Project)

**Members:** Yingao Zheng, Mohamed Alketbi, Abdelrahman Almatrooshi, Junhao Zhou, Kexin Wang, Langyuan Huang, Saba Al-Gafri, Ayten Arab, Jaroslav Rakoto-Miklas

---

## Links

### Project access

- Git repository: [GAP_Large_project](https://github.kcl.ac.uk/k23172173/GAP_Large_project)
- Deployed app (Hugging Face): [FocusGuard/final_v2](https://huggingface.co/spaces/FocusGuard/final_v2)
- ClearML experiments: [FocusGuards Large Group Project](https://app.5ccsagap.er.kcl.ac.uk/projects/ce218b2f751641c68042f8fa216f8746/experiments)

### Data and checkpoints

- Checkpoints (Google Drive): [Download folder](https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link)
- Dataset (Google Drive): [Dataset folder](https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing)
- Data consent form (PDF): [Consent document](https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link)

The deployed app contains the full feature set (session history, L2CS calibration, model selector, achievements).

---

## Trained models

Model checkpoints are **not included** in the submission archive. Download them before running inference.

### Option 1: Hugging Face Space

Pre-trained checkpoints are available in the Hugging Face Space files:

```
https://huggingface.co/spaces/FocusGuard/final_v2/tree/main/checkpoints
```

Download and place into `checkpoints/`:

| File | Description |
|------|-------------|
| `mlp_best.pt` | PyTorch MLP (10-64-32-2, ~2,850 params) |
| `xgboost_face_orientation_best.json` | XGBoost (600 trees, depth 8, lr 0.1489) |
| `scaler_mlp.joblib` | StandardScaler fit on training data |
| `hybrid_focus_config.json` | Hybrid pipeline fusion weights |
| `hybrid_combiner.joblib` | Hybrid combiner |
| `L2CSNet_gaze360.pkl` | L2CS-Net ResNet50 gaze weights (96 MB) |

### Option 2: ClearML

Models are registered as ClearML OutputModels under project "FocusGuards Large Group Project".

| Model | Task ID | Model ID |
|-------|---------|----------|
| MLP | `3899b5aa0c3348b28213a3194322cdf7` | `56f94b799f624bdc845fa50c4d0606fe` |
| XGBoost | `c0ceb8e7e8194a51a7a31078cc47775c` | `6727b8de334f4ca0961c46b436f6fb7c` |

**UI:** Open a task on the [experiments page](https://app.5ccsagap.er.kcl.ac.uk/projects/ce218b2f751641c68042f8fa216f8746/experiments), go to Artifacts > Output Models, and download.

**Python:**

```python
from clearml import Model

mlp = Model(model_id="56f94b799f624bdc845fa50c4d0606fe")
mlp_path = mlp.get_local_copy()   # downloads .pt

xgb = Model(model_id="6727b8de334f4ca0961c46b436f6fb7c")
xgb_path = xgb.get_local_copy()   # downloads .json
```

Copy the downloaded files into `checkpoints/`.

### Option 3: Google Drive (submission fallback)

If ClearML access is restricted, download checkpoints from:
https://drive.google.com/drive/folders/15yYHKgCHg5AFIBb04XnVaeqHRukwBLAd?usp=drive_link

Place all files under `checkpoints/`.

### Option 4: Retrain from scratch

```bash
python -m models.mlp.train
python -m models.xgboost.train
```

This regenerates `checkpoints/mlp_best.pt`, `checkpoints/xgboost_face_orientation_best.json`, and scalers. Requires training data under `data/collected_*/`.

---

## Project layout

```
config/
    default.yaml              hyperparameters, thresholds, app settings
    __init__.py               config loader + ClearML flattener
    clearml_enrich.py         ClearML task enrichment + artifact upload
data_preparation/
    prepare_dataset.py        load/split/scale .npz files (pooled + LOPO)
    data_exploration.ipynb    EDA: distributions, class balance, correlations
models/
    face_mesh.py              MediaPipe 478-point face landmarks
    head_pose.py              yaw/pitch/roll via solvePnP, face-orientation score
    eye_scorer.py             EAR, MAR, gaze ratios, PERCLOS
    collect_features.py       real-time feature extraction + webcam labelling CLI
    gaze_calibration.py       9-point polynomial gaze calibration
    gaze_eye_fusion.py        fuses calibrated gaze with eye openness
    mlp/                      MLP training, eval, Optuna sweep
    xgboost/                  XGBoost training, eval, ClearML + Optuna sweeps
    L2CS-Net/                 vendored L2CS-Net (ResNet50, Gaze360)
checkpoints/                  (excluded from archive; see download instructions above)
notebooks/
    mlp.ipynb                 MLP training + LOPO in Jupyter
    xgboost.ipynb             XGBoost training + LOPO in Jupyter
evaluation/
    justify_thresholds.py     LOPO threshold + weight grid search
    feature_importance.py     XGBoost gain + leave-one-feature-out ablation
    grouped_split_benchmark.py  pooled vs LOPO comparison
    plots/                    ROC curves, confusion matrices, weight searches
    logs/                     JSON training logs
tests/
    test_*.py                 unit + integration tests (pytest)
    .coveragerc               coverage config
ui/
    pipeline.py               all 5 pipeline classes + output smoothing
    live_demo.py              OpenCV webcam demo
src/                          React (Vite) frontend source
static/                       built frontend assets (after npm build)
main.py                       FastAPI application entry point
package.json                  frontend package manifest
requirements.txt
pytest.ini
```

---

## Setup

Recommended versions:

- Python 3.10-3.11
- Node.js 18+ (needed only for frontend rebuild/dev)

```bash
python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate
pip install -r requirements.txt
```

Then download checkpoints (see above).

If you need to rebuild frontend assets locally:

```bash
npm install
npm run build
mkdir -p static && cp -r dist/* static/
```

---

## Run

### Local OpenCV demo

```bash
python ui/live_demo.py        
python ui/live_demo.py --xgb    # XGBoost
```

Controls: `m` cycle mesh overlay, `1-5` switch pipeline mode, `q` quit.

### Web app (without Docker)

```bash
source venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 7860
```

Open http://localhost:7860


### Web app (Docker)

```bash
docker-compose up               # serves on port 7860
```

---

## Data collection

```bash
python -m models.collect_features --name <participant>
```

Records webcam sessions with real-time binary labelling (spacebar toggles focused/unfocused). Saves per-frame feature vectors to `data/collected_<participant>/` as `.npz` files. Raw video is never stored.

9 participants recorded 5-10 min sessions across varied environments (144,793 frames total, 61.5% focused / 38.5% unfocused). All participants provided informed consent. Dataset files are not included in this repository.

Consent document: https://drive.google.com/file/d/1g1Hc764ffljoKrjApD6nmWDCXJGYTR0j/view?usp=drive_link
Raw participant dataset is excluded from this submission (coursework policy and privacy constraints). It can be shared with module staff on request: https://drive.google.com/drive/folders/1fwACM6i6uVGFkTlJKSlqVhizzgrHl_gY?usp=sharing

---

## Pipeline

```
Webcam frame
  --> MediaPipe Face Mesh (478 landmarks)
    --> Head pose (solvePnP): yaw, pitch, roll, s_face, head_deviation
    --> Eye scorer: EAR_left, EAR_right, EAR_avg, s_eye, MAR
    --> Gaze ratios: h_gaze, v_gaze, gaze_offset
    --> Temporal tracker: PERCLOS, blink_rate, closure_dur, yawn_dur
  --> 17 features --> select 10 --> clip to physiological bounds
  --> ML model (MLP / XGBoost) or geometric scorer
  --> Asymmetric EMA smoothing (alpha_up=0.55, alpha_down=0.45)
  --> FOCUSED / UNFOCUSED
```

Five runtime modes share the same feature extraction backbone:

| Mode | Description |
|------|-------------|
| **Geometric** | Deterministic scoring: 0.7 * s_face + 0.3 * s_eye, cosine-decay with max_angle=22 deg |
| **XGBoost** | 600-tree gradient-boosted ensemble, threshold 0.28 (LOPO-optimal) |
| **MLP** | PyTorch 10-64-32-2 perceptron, threshold 0.23 (LOPO-optimal) |
| **Hybrid** | 30% MLP + 70% geometric ensemble (LOPO F1 = 0.841) |
| **L2CS** | Deep gaze estimation via L2CS-Net (ResNet50, Gaze360 pretrained) |

Any mode can be combined with L2CS Boost mode (35% base + 65% L2CS, fused threshold 0.52). Off-screen gaze produces near-zero L2CS score via cosine decay, acting as a soft veto.

---

## Training

Both scripts read all hyperparameters from `config/default.yaml`.

```bash
python -m models.mlp.train
python -m models.xgboost.train
```

Outputs: `checkpoints/` (model + scaler) and `evaluation/logs/` (CSVs, JSON summaries).

### ClearML experiment tracking

```bash
USE_CLEARML=1 python -m models.mlp.train
USE_CLEARML=1 CLEARML_QUEUE=gpu python -m models.xgboost.train
USE_CLEARML=1 python -m evaluation.justify_thresholds --clearml
```

Logs hyperparameters, per-epoch scalars, confusion matrices, ROC curves, model registration, dataset stats, and reproducibility artifacts (config YAML, requirements.txt, git SHA).

Reference experiment IDs:

| Model | ClearML experiment ID |
|-------|------------------------|
| MLP (`models.mlp.train`) | `3899b5aa0c3348b28213a3194322cdf7` |
| XGBoost (`models.xgboost.train`) | `c0ceb8e7e8194a51a7a31078cc47775c` |

---

## Evaluation

```bash
python -m evaluation.justify_thresholds          # LOPO threshold + weight search
python -m evaluation.grouped_split_benchmark     # pooled vs LOPO comparison
python -m evaluation.feature_importance          # XGBoost gain + LOFO ablation
```

### Results (pooled random split, 15% test)

| Model | Accuracy | F1 | ROC-AUC |
|-------|----------|----|---------|
| XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 |
| MLP (64-32) | 92.92% | 0.929 | 0.971 |

### Results (LOPO, 9 participants)

| Model | LOPO AUC | Best threshold (Youden's J) | F1 at best threshold |
|-------|----------|-----------------------------|----------------------|
| MLP | 0.862 | 0.228 | 0.858 |
| XGBoost | 0.870 | 0.280 | 0.855 |

Best geometric face weight (alpha) = 0.7 (mean LOPO F1 = 0.820).
Best hybrid MLP weight (w_mlp) = 0.3 (mean LOPO F1 = 0.841).

The ~12 pp drop from pooled to LOPO reflects temporal data leakage and confirms LOPO as the primary generalisation metric.

### Feature ablation

| Channel subset | Mean LOPO F1 |
|----------------|-------------|
| All 10 features | 0.829 |
| Eye state only | 0.807 |
| Head pose only | 0.748 |
| Gaze only | 0.726 |

Top-5 XGBoost gain: `s_face` (10.27), `ear_right` (9.54), `head_deviation` (8.83), `ear_avg` (6.96), `perclos` (5.68).

---

## L2CS Gaze Tracking

L2CS-Net predicts where your eyes are looking, not just where your head is pointed, catching the scenario where the head faces the screen but eyes wander.

**Standalone mode:** Select L2CS as the model.

**Boost mode:** Select any other model, then enable the GAZE toggle. L2CS runs alongside the base model with score-level fusion (35% base / 65% L2CS). Off-screen gaze triggers a soft veto.

**Calibration:** Click Calibrate during a session. A fullscreen overlay shows 9 target dots (3x3 grid). After all 9 points, a degree-2 polynomial maps gaze angles to screen coordinates with IQR outlier filtering and centre-point bias correction.

L2CS weight lookup order in runtime:

1. `checkpoints/L2CSNet_gaze360.pkl`
2. `models/L2CS-Net/models/L2CSNet_gaze360.pkl`
3. `models/L2CSNet_gaze360.pkl`

---

## Config

All hyperparameters and app settings are in `config/default.yaml`. Override with `FOCUSGUARD_CONFIG=/path/to/custom.yaml`.

---

## Tests

Included checks:

- data prep helpers and real split consistency (`test_data_preparation.py`; split test **skips** if `data/collected_*/*.npz` is absent)
- feature clipping (`test_models_clip_features.py`)
- pipeline integration (`test_pipeline_integration.py`)
- gaze calibration / fusion diagnostics (`test_gaze_pipeline.py`)
- FastAPI health, settings, sessions (`test_health_endpoint.py`, `test_api_settings.py`, `test_api_sessions.py`)

```bash
pytest
```

Coverage is enabled by default via `pytest.ini` (`--cov` / term report). For HTML coverage: `pytest --cov-report=html`.

**Stack:** Python, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net, FastAPI, React/Vite, SQLite, Docker, ClearML, pytest.