Spaces:

FocusGuard
/

final_test

Sleeping

App Files Files Community

final_test / ui /README.md

Abdelrahman Almatrooshi

Deploy snapshot from main b7a59b11809483dfc959f196f1930240f2662c49

22a6915 about 2 months ago

preview code

raw

history blame contribute delete

3.66 kB

	# ui

	Real-time inference pipelines and demo interface. This package bridges the trained models with live webcam input, producing frame-by-frame focus predictions.

	## Pipeline modes

	FocusGuard supports five runtime modes, all sharing the same feature extraction backbone:

	\| Mode \| Pipeline class \| What it does \|
	\|------\|---------------\|-------------\|
	\| Geometric \| `FaceMeshPipeline` \| Deterministic scoring from head pose and eye state. No ML model needed. Fastest option. \|
	\| MLP \| `MLPPipeline` \| 10 features through the PyTorch MLP (10-64-32-2). Threshold: 0.23 (LOPO Youden's J). \|
	\| XGBoost \| `XGBoostPipeline` \| 10 features through XGBoost (600 trees). Threshold: 0.28 (LOPO Youden's J). \|
	\| Hybrid \| `HybridPipeline` \| 30% MLP + 70% geometric ensemble (w_mlp=0.3, alpha=0.7). LOPO F1: 0.841. \|
	\| L2CS \| `L2CSPipeline` \| Deep gaze estimation via L2CS-Net (ResNet50). Standalone focus scoring from gaze direction. \|

	Any mode can be combined with L2CS Boost mode (toggle in the UI), which fuses the base score (35%) with L2CS gaze score (65%) and applies gaze-based veto for off-screen looks.

	## Output smoothing

	All pipelines use asymmetric EMA (`_OutputSmoother`) to stabilise predictions:

	\| Parameter \| Value \| Effect \|
	\|-----------\|-------\|--------\|
	\| alpha_up \| 0.55 \| Fast rise: recognises focus quickly \|
	\| alpha_down \| 0.45 \| Slower fall: avoids flicker on brief glances \|
	\| grace_frames \| 10 (~0.33s at 30fps) \| Holds score steady when face is briefly occluded \|

	## Geometric scoring

	`FaceMeshPipeline` computes:

	- `s_face`: cosine-decay face orientation score from solvePnP (max_angle=22 deg, roll down-weighted 50%)
	- `s_eye`: EAR-based eye openness score multiplied by iris gaze score
	- Combined score: `0.7 * s_face + 0.3 * s_eye` (weights from LOPO grid search)
	- MAR yawn veto: MAR > 0.55 overrides to unfocused

	## L2CS Boost mode

	When enabled alongside any base model:

	1. L2CS-Net predicts gaze yaw/pitch from the face crop
	2. Calibrated gaze is mapped to screen coordinates (if calibration was done)
	3. Fusion: `0.35 * base_score + 0.65 * l2cs_score` with fused threshold 0.52
	4. Off-screen gaze produces near-zero L2CS score via cosine decay, dragging fused score below threshold (soft veto)

	This catches the key edge case where head faces the screen but eyes wander to a second monitor or phone.

	## Files

	\| File \| Purpose \|
	\|------\|---------\|
	\| `pipeline.py` \| All pipeline classes, feature clipping, output smoothing, hybrid config, runtime feature engine \|
	\| `live_demo.py` \| OpenCV webcam demo with real-time overlay (bounding box, mesh, gaze lines, score bar) \|

	## Local demo

	```bash
	python ui/live_demo.py # MLP (default)
	python ui/live_demo.py --xgb # XGBoost
	```

	Controls: `m` cycle mesh overlay, `1-5` switch pipeline mode, `q` quit.

	## Web application

	The full web app (React frontend + FastAPI backend) runs from `main.py` in the project root:

	- WebSocket (`/ws/video`): frame-slot architecture, only most recent frame processed, stale frames dropped
	- WebRTC (`/api/webrtc/offer`): SDP exchange + ICE gathering for lower-latency streaming
	- Inference offloaded to `ThreadPoolExecutor` (4 workers, per-pipeline locks)
	- SQLite database persists sessions and per-frame events via `EventBuffer` (flushes every 2s)
	- Frontend pages: Focus tracking with live overlays, session records, achievements/gamification, model customisation, 9-point gaze calibration, help documentation

	Deployment via Docker: `docker-compose up` (port 7860). Vite builds the frontend statically into FastAPI's static directory. L2CS-Net weights are pulled at runtime via `huggingface_hub`.