Spaces:

FocusGuard
/

integration_test2

Sleeping

File size: 4,471 Bytes

7b53d75

---
title: FocusGuard
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: false
---

# FocusGuard

Webcam-based focus detection: MediaPipe face mesh -> 17 features (EAR, gaze, head pose, PERCLOS, etc.) -> MLP or XGBoost for focused/unfocused. React + FastAPI app with WebSocket video.

## Project layout

```
├── data/                 collected_<name>/*.npz
├── data_preparation/     loaders, split, scale
├── notebooks/            MLP/XGB training + LOPO
├── models/               face_mesh, head_pose, eye_scorer, train scripts
│   ├── gaze_calibration.py   9-point polynomial gaze calibration
│   ├── gaze_eye_fusion.py    Fuses calibrated gaze with eye openness
│   └── L2CS-Net/              In-tree L2CS-Net repo with Gaze360 weights
├── checkpoints/          mlp_best.pt, xgboost_*_best.json, scalers
├── evaluation/           logs, plots, justify_thresholds
├── ui/                   pipeline.py, live_demo.py
├── src/                  React frontend
│   ├── components/
│   │   ├── FocusPageLocal.jsx      Main focus page (camera, controls, model selector)
│   │   └── CalibrationOverlay.jsx  Fullscreen calibration UI
│   └── utils/
│       └── VideoManagerLocal.js    WebSocket client, frame capture, canvas rendering
├── static/               built frontend (after npm run build)
├── main.py, app.py       FastAPI backend
├── requirements.txt
└── package.json
```

## Setup

```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

To rebuild the frontend after changes:

```bash
npm install
npm run build
mkdir -p static && cp -r dist/* static/
```

## Run

**Web app:** Use the venv and run uvicorn via Python so it picks up your deps (otherwise you get `ModuleNotFoundError: aiosqlite`):

```bash
source venv/bin/activate
python -m uvicorn main:app --host 0.0.0.0 --port 7860
```

Then open http://localhost:7860.

**Frontend dev server (optional, for React development):**

```bash
npm run dev
```

**OpenCV demo:**

```bash
python ui/live_demo.py
python ui/live_demo.py --xgb
```

**Train:**

```bash
python -m models.mlp.train
python -m models.xgboost.train
```

## Data

9 participants, 144,793 samples, 10 features, binary labels. Collect with `python -m models.collect_features --name <name>`. Data lives in `data/collected_<name>/`.

## Models

| Model | What it uses | Best for |
|-------|-------------|----------|
| **Geometric** | Head pose angles + eye aspect ratio (EAR) | Fast, no ML needed |
| **XGBoost** | Trained classifier on head/eye features (600 trees, depth 8) | Balanced accuracy/speed |
| **MLP** | Neural network on same features (64->32) | Higher accuracy |
| **Hybrid** | Weighted MLP + Geometric ensemble | Best head-pose accuracy |
| **L2CS** | Deep gaze estimation (ResNet50, Gaze360 weights) | Detects eye-only gaze shifts |

## Model numbers (15% test split)

| Model | Accuracy | F1 | ROC-AUC |
|-------|----------|-----|---------|
| XGBoost (600 trees, depth 8) | 95.87% | 0.959 | 0.991 |
| MLP (64->32) | 92.92% | 0.929 | 0.971 |

## L2CS Gaze Tracking

L2CS-Net predicts where your eyes are looking, not just where your head is pointed. This catches the scenario where your head faces the screen but your eyes wander.

### Standalone mode
Select **L2CS** as the model - it handles everything.

### Boost mode
Select any other model, then click the **GAZE** toggle. L2CS runs alongside the base model:
- Base model handles head pose and eye openness (35% weight)
- L2CS handles gaze direction (65% weight)
- If L2CS detects gaze is clearly off-screen, it **vetoes** the base model regardless of score

### Calibration
After enabling L2CS or Gaze Boost, click **Calibrate** while a session is running:
1. A fullscreen overlay shows 9 target dots (3x3 grid)
2. Look at each dot as the progress ring fills
3. The first dot (centre) sets your baseline gaze offset
4. After all 9 points, a polynomial model maps your gaze angles to screen coordinates
5. A cyan tracking dot appears on the video showing where you're looking

## Pipeline

1. Face mesh (MediaPipe 478 pts)
2. Head pose -> yaw, pitch, roll, scores, gaze offset
3. Eye scorer -> EAR, gaze ratio, MAR
4. Temporal -> PERCLOS, blink rate, yawn
5. 10-d vector -> MLP or XGBoost -> focused / unfocused

**Stack:** FastAPI, aiosqlite, React/Vite, PyTorch, XGBoost, MediaPipe, OpenCV, L2CS-Net.