final_test / models /xgboost /README.md
Abdelrahman Almatrooshi
Deploy snapshot from main b7a59b11809483dfc959f196f1930240f2662c49
22a6915
# models/xgboost
Gradient-boosted tree ensemble for binary focus classification. Primary ML model in the FocusGuard pipeline. Uses the same 10 selected features as the MLP.
## Configuration
Final hyperparameters selected from a 40-trial Optuna sweep:
| Parameter | Value | Source |
|-----------|-------|--------|
| n_estimators | 600 | `xgboost.n_estimators` |
| max_depth | 8 | `xgboost.max_depth` |
| learning_rate | 0.1489 | `xgboost.learning_rate` |
| subsample | 0.9625 | `xgboost.subsample` |
| colsample_bytree | 0.9013 | `xgboost.colsample_bytree` |
| reg_alpha | 1.1407 | `xgboost.reg_alpha` |
| reg_lambda | 2.4181 | `xgboost.reg_lambda` |
| eval_metric | logloss | `xgboost.eval_metric` |
## Training
```bash
python -m models.xgboost.train
```
Reads all parameters from `config/default.yaml`. Uses `XGBClassifier` with early stopping on validation logloss.
## Results
### Pooled random split (70/15/15)
| Accuracy | F1 | ROC-AUC |
|----------|-----|---------|
| 95.87% | 0.959 | 0.991 |
### LOPO cross-validation (9 participants)
| Metric | Value |
|--------|-------|
| LOPO AUC | 0.870 |
| Optimal threshold (Youden's J) | 0.280 |
| F1 at optimal threshold | 0.855 |
| F1 at default 0.50 | 0.832 |
| Improvement from threshold tuning | +2.3 pp |
The ~12 pp drop from pooled to LOPO reflects temporal data leakage and underscores why person-independent evaluation matters.
### Per-person LOPO (at t* = 0.280)
| Held-out | Acc | F1 | Prec | Rec |
|----------|-----|-----|------|-----|
| Abdelrahman | 0.864 | 0.900 | 0.904 | 0.896 |
| Jarek | 0.872 | 0.903 | 0.902 | 0.904 |
| Junhao | 0.890 | 0.901 | 0.841 | 0.971 |
| Kexin | 0.738 | 0.747 | 0.778 | 0.717 |
| Langyuan | 0.655 | 0.677 | 0.548 | 0.888 |
| Mohamed | 0.881 | 0.894 | 0.843 | 0.952 |
| Yingtao | 0.855 | 0.909 | 0.926 | 0.894 |
| Ayten | 0.841 | 0.905 | 0.861 | 0.954 |
| Saba | 0.923 | 0.925 | 0.956 | 0.896 |
| **Mean +/- std** | **.835 +/- .080** | **.862 +/- .082** | **.840 +/- .115** | **.897 +/- .070** |
95% CI for mean F1: [0.799, 0.926]
### Feature importance (XGBoost gain)
Top 5: `s_face` (10.27), `ear_right` (9.54), `head_deviation` (8.83), `ear_avg` (6.96), `perclos` (5.68)
## ClearML integration
```bash
USE_CLEARML=1 python -m models.xgboost.train
```
Same enrichment as MLP: hyperparameters, per-round scalars, confusion matrices, ROC curves, model registration, dataset stats, and reproducibility artifacts.
## Sweeps
### ClearML HPO (remote)
```bash
USE_CLEARML=1 python -m models.xgboost.sweep
```
Launches a `HyperParameterOptimizer` controller on ClearML that clones the base training task and runs grid/random search across workers.
### Local Optuna sweep
```bash
python -m models.xgboost.sweep_local
```
40-trial TPE sampler, optimising LOPO F1. Search space: n_estimators [100-1000], max_depth [3-10], learning_rate [0.01-0.3], subsample [0.6-1.0], colsample_bytree [0.6-1.0], reg_alpha/lambda [0-5].
### Export local sweep results from ClearML
```bash
python -m models.xgboost.fetch_sweep_results
```
Writes `models/xgboost/sweep_results_all_40.csv` using the metrics logged by `sweep_local.py` (`val_loss`, `val_accuracy`, `val_f1`) plus each trial's hyperparameters.
Default export behavior is strict and deterministic:
- only tasks named like `XGBoost Sweep Trial #...`
- only the latest 40 matching tasks
- skips tasks with missing or zero `val_loss` / `val_accuracy` / `val_f1`
- ranks by `val_f1` (then `val_loss`, then `val_accuracy`)
Useful options:
```bash
python -m models.xgboost.fetch_sweep_results --limit 0 --keep-zero-metrics
python -m models.xgboost.fetch_sweep_results --name-prefix "XGBoost Sweep Trial #" --limit 40
python -m models.xgboost.fetch_sweep_results --compute-missing-val-accuracy --sort-by val_f1
```
## Outputs
| File | Location |
|------|----------|
| Best model | `checkpoints/xgboost_face_orientation_best.json` |
| Scaler | `checkpoints/scaler_xgboost.joblib` |
| Test predictions | `evaluation/logs/xgboost_test_predictions.csv` |
| Test metrics | `evaluation/logs/xgboost_test_metrics_summary.json` |
| Feature importance | `evaluation/logs/xgboost_feature_importance.json` |