Spaces:
Sleeping
Sleeping
| # ui | |
| Real-time inference pipelines and demo interface. This package bridges the trained models with live webcam input, producing frame-by-frame focus predictions. | |
| ## Pipeline modes | |
| FocusGuard supports five runtime modes, all sharing the same feature extraction backbone: | |
| | Mode | Pipeline class | What it does | | |
| |------|---------------|-------------| | |
| | **Geometric** | `FaceMeshPipeline` | Deterministic scoring from head pose and eye state. No ML model needed. Fastest option. | | |
| | **MLP** | `MLPPipeline` | 10 features through the PyTorch MLP (10-64-32-2). Threshold: 0.23 (LOPO Youden's J). | | |
| | **XGBoost** | `XGBoostPipeline` | 10 features through XGBoost (600 trees). Threshold: 0.28 (LOPO Youden's J). | | |
| | **Hybrid** | `HybridPipeline` | 30% MLP + 70% geometric ensemble (w_mlp=0.3, alpha=0.7). LOPO F1: 0.841. | | |
| | **L2CS** | `L2CSPipeline` | Deep gaze estimation via L2CS-Net (ResNet50). Standalone focus scoring from gaze direction. | | |
| Any mode can be combined with L2CS **Boost mode** (toggle in the UI), which fuses the base score (35%) with L2CS gaze score (65%) and applies gaze-based veto for off-screen looks. | |
| ## Output smoothing | |
| All pipelines use asymmetric EMA (`_OutputSmoother`) to stabilise predictions: | |
| | Parameter | Value | Effect | | |
| |-----------|-------|--------| | |
| | alpha_up | 0.55 | Fast rise: recognises focus quickly | | |
| | alpha_down | 0.45 | Slower fall: avoids flicker on brief glances | | |
| | grace_frames | 10 (~0.33s at 30fps) | Holds score steady when face is briefly occluded | | |
| ## Geometric scoring | |
| `FaceMeshPipeline` computes: | |
| - `s_face`: cosine-decay face orientation score from solvePnP (max_angle=22 deg, roll down-weighted 50%) | |
| - `s_eye`: EAR-based eye openness score multiplied by iris gaze score | |
| - Combined score: `0.7 * s_face + 0.3 * s_eye` (weights from LOPO grid search) | |
| - MAR yawn veto: MAR > 0.55 overrides to unfocused | |
| ## L2CS Boost mode | |
| When enabled alongside any base model: | |
| 1. L2CS-Net predicts gaze yaw/pitch from the face crop | |
| 2. Calibrated gaze is mapped to screen coordinates (if calibration was done) | |
| 3. Fusion: `0.35 * base_score + 0.65 * l2cs_score` with fused threshold 0.52 | |
| 4. Off-screen gaze produces near-zero L2CS score via cosine decay, dragging fused score below threshold (soft veto) | |
| This catches the key edge case where head faces the screen but eyes wander to a second monitor or phone. | |
| ## Files | |
| | File | Purpose | | |
| |------|---------| | |
| | `pipeline.py` | All pipeline classes, feature clipping, output smoothing, hybrid config, runtime feature engine | | |
| | `live_demo.py` | OpenCV webcam demo with real-time overlay (bounding box, mesh, gaze lines, score bar) | | |
| ## Local demo | |
| ```bash | |
| python ui/live_demo.py # MLP (default) | |
| python ui/live_demo.py --xgb # XGBoost | |
| ``` | |
| Controls: `m` cycle mesh overlay, `1-5` switch pipeline mode, `q` quit. | |
| ## Web application | |
| The full web app (React frontend + FastAPI backend) runs from `main.py` in the project root: | |
| - **WebSocket** (`/ws/video`): frame-slot architecture, only most recent frame processed, stale frames dropped | |
| - **WebRTC** (`/api/webrtc/offer`): SDP exchange + ICE gathering for lower-latency streaming | |
| - Inference offloaded to `ThreadPoolExecutor` (4 workers, per-pipeline locks) | |
| - SQLite database persists sessions and per-frame events via `EventBuffer` (flushes every 2s) | |
| - Frontend pages: Focus tracking with live overlays, session records, achievements/gamification, model customisation, 9-point gaze calibration, help documentation | |
| Deployment via Docker: `docker-compose up` (port 7860). Vite builds the frontend statically into FastAPI's static directory. L2CS-Net weights are pulled at runtime via `huggingface_hub`. | |