final / data_preparation /README.md
k22056537
evaluation: channel ablation script + feature importance LOPO
e69e3a3
# data_preparation/
Load and split the .npz data. Used by all training code and notebooks.
**prepare_dataset.py:** `load_all_pooled()`, `load_per_person()` for LOPO, `get_numpy_splits()` (XGBoost), `get_dataloaders()` (MLP). Cleans yaw/pitch/roll and EAR to fixed ranges. Face_orientation uses 10 features: head_deviation, s_face, s_eye, h_gaze, pitch, ear_left, ear_avg, ear_right, gaze_offset, perclos.
**data_exploration.ipynb:** EDA — stats, class balance, histograms, correlations.
You don’t run prepare_dataset directly; import it from `models.mlp.train`, `models.xgboost.train`, or the notebooks.