File size: 3,066 Bytes
22a6915
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# models

Feature extraction, geometric scoring, and ML model training. Shared modules at the top level handle the core computer vision pipeline; subdirectories contain model-specific training and sweep scripts.

## Inference pipeline

```
Webcam frame
  |
  v
MediaPipe Face Mesh (face_mesh.py) --> 478 landmarks
  |
  +---> HeadPoseEstimator (head_pose.py)    --> yaw, pitch, roll, s_face
  +---> EyeBehaviourScorer (eye_scorer.py)  --> EAR, s_eye, MAR
  +---> GazeRatio (eye_scorer.py)           --> h_gaze, v_gaze, gaze_offset
  +---> TemporalTracker (collect_features.py) --> PERCLOS, blink_rate, closure_dur
  |
  v
17-feature vector --> clip --> select 10 --> ML model or geometric scorer
```

## Shared modules

| File | Purpose |
|------|---------|
| `face_mesh.py` | MediaPipe Face Landmarker wrapper (478 landmarks including 10 iris points) |
| `head_pose.py` | `HeadPoseEstimator`: solvePnP on 6 anatomical landmarks (nose tip, chin, eye corners, mouth corners), cosine-decay face orientation score with max_angle=22 deg and roll down-weighted 50% |
| `eye_scorer.py` | `EyeBehaviourScorer`: EAR from 6 landmarks per eye (open=0.30, closed=0.16), iris-based gaze scoring (cosine decay, max_offset=0.28), MAR yawn detection (threshold=0.55) |
| `collect_features.py` | 17-feature extraction with `TemporalTracker` (PERCLOS over 60 frames, blink rate over 30s window); webcam labelling CLI for data collection |
| `gaze_calibration.py` | `GazeCalibration`: 9-point polynomial (degree-2) mapping from raw L2CS gaze angles to normalised screen coordinates, with IQR outlier filtering and centre-point bias correction |
| `gaze_eye_fusion.py` | `GazeEyeFusion`: fuses calibrated gaze position with EAR for continuous focus scoring; sustained eye closure veto (4+ frames) |

## Subdirectories

| Directory | Contents |
|-----------|----------|
| [`mlp/`](mlp/) | PyTorch MLP (10-64-32-2, ~2,850 params): training, evaluation, Optuna sweep |
| [`xgboost/`](xgboost/) | XGBoost (600 trees, depth 8, lr 0.1489): training, evaluation, ClearML + Optuna sweeps |
| `L2CS-Net/` | Vendored L2CS-Net gaze estimator (ResNet50 pretrained on Gaze360) |

## Data collection

```bash
python -m models.collect_features --name <participant>
```

Records a webcam session with real-time binary labelling (spacebar toggles focused/unfocused). Outputs `.npz` files to `data/collected_<participant>/` containing the 17-feature vector and labels per frame. Quality guidance is displayed during recording (class balance warnings, transition count).

9 participants each recorded 5-10 minute sessions across varied environments, totalling 144,793 frames (61.5% focused, 38.5% unfocused). Only extracted feature vectors are stored; raw video is never saved.

## Geometric scoring formulas

**Face orientation score:** `s_face = 0.5 * (1 + cos(pi * min(d / 22, 1)))` where `d = sqrt(yaw^2 + pitch^2 + (0.5*roll)^2)`

**Eye behaviour score:** `s_eye = ear_score * gaze_score`, where EAR is linearly mapped [0.16, 0.30] to [0, 1] and gaze uses the same cosine decay with max_offset=0.28