LeBabyOx's picture
Create README.md
d721840 verified
---
license: mit
datasets:
- LeBabyOx/EEGParquet
language:
- en
metrics:
- accuracy
- f1
- precision
- recall
- roc_auc
pipeline_tag: tabular-classification
library_name: sklearn
tags:
- eeg
- seizure-detection
- biomedical
- time-series
- imbalanced-data
- healthcare
- classical-ml
---
## 🧠 Key Insights
- Tree-based models (RF, XGBoost) fail under extreme imbalance, predicting only the majority class.
- Linear models achieve high recall but suffer from extremely low precision.
- Threshold tuning significantly improves performance:
- F1 improved from 0.0085 β†’ 0.0769 (LogReg)
---
## βš™οΈ Usage
```python
import joblib
model = joblib.load("models/logistic_regression.joblib")
preds = model.predict(X)
```
## ⚠️ Limitations
Models struggle with extreme imbalance (~1600:1)
Poor generalization across subjects (LOSO results)
Classical ML is insufficient for robust seizure detection in this setting
## πŸ“š Citation
If you use this model, please cite:
```
@dataset{eegparquet_benchmark_2026,
title={EEGParquet-Benchmark: Windowed and Feature-Enriched EEG Dataset for Seizure Detection},
author={Daffa Tarigan},
year={2026},
publisher={Hugging Face}
}
```
## πŸš€ Notes
This repository is intended for:
Benchmarking classical ML under imbalance
Demonstrating limitations of accuracy-based evaluation
Supporting research in biomedical signal classification
---
## 1. Folder structure (important)
```
/models
β”œβ”€β”€ logistic_regression.joblib
β”œβ”€β”€ random_forest.joblib
β”œβ”€β”€ svm_rbf_cuml_gpu.joblib
β”œβ”€β”€ xgboost_gpu_optuna.joblib
```