metadata
license: mit
datasets:
- LeBabyOx/EEGParquet
language:
- en
metrics:
- accuracy
- f1
- precision
- recall
- roc_auc
pipeline_tag: tabular-classification
library_name: sklearn
tags:
- eeg
- seizure-detection
- biomedical
- time-series
- imbalanced-data
- healthcare
- classical-ml
π§ Key Insights
- Tree-based models (RF, XGBoost) fail under extreme imbalance, predicting only the majority class.
- Linear models achieve high recall but suffer from extremely low precision.
- Threshold tuning significantly improves performance:
- F1 improved from 0.0085 β 0.0769 (LogReg)
βοΈ Usage
import joblib
model = joblib.load("models/logistic_regression.joblib")
preds = model.predict(X)
β οΈ Limitations
Models struggle with extreme imbalance (~1600:1) Poor generalization across subjects (LOSO results) Classical ML is insufficient for robust seizure detection in this setting
π Citation
If you use this model, please cite:
@dataset{eegparquet_benchmark_2026,
title={EEGParquet-Benchmark: Windowed and Feature-Enriched EEG Dataset for Seizure Detection},
author={Daffa Tarigan},
year={2026},
publisher={Hugging Face}
}
π Notes
This repository is intended for:
Benchmarking classical ML under imbalance Demonstrating limitations of accuracy-based evaluation Supporting research in biomedical signal classification
1. Folder structure (important)
/models
βββ logistic_regression.joblib
βββ random_forest.joblib
βββ svm_rbf_cuml_gpu.joblib
βββ xgboost_gpu_optuna.joblib