---
license: mit
datasets:
- LeBabyOx/EEGParquet
language:
- en
metrics:
- accuracy
- f1
- precision
- recall
- roc_auc
pipeline_tag: tabular-classification
library_name: sklearn
tags:
- eeg
- seizure-detection
- biomedical
- time-series
- imbalanced-data
- healthcare
- classical-ml
---


## 🧠 Key Insights

- Tree-based models (RF, XGBoost) fail under extreme imbalance, predicting only the majority class.
- Linear models achieve high recall but suffer from extremely low precision.
- Threshold tuning significantly improves performance:
  - F1 improved from 0.0085 → 0.0769 (LogReg)

---

## ⚙️ Usage

```python
import joblib

model = joblib.load("models/logistic_regression.joblib")

preds = model.predict(X)
```

## ⚠️ Limitations
Models struggle with extreme imbalance (~1600:1)
Poor generalization across subjects (LOSO results)
Classical ML is insufficient for robust seizure detection in this setting

## 📚 Citation

If you use this model, please cite:
```
@dataset{eegparquet_benchmark_2026,
  title={EEGParquet-Benchmark: Windowed and Feature-Enriched EEG Dataset for Seizure Detection},
  author={Daffa Tarigan},
  year={2026},
  publisher={Hugging Face}
}
```

## 🚀 Notes

This repository is intended for:

Benchmarking classical ML under imbalance
Demonstrating limitations of accuracy-based evaluation
Supporting research in biomedical signal classification

---
## 1. Folder structure (important)
```
/models
├── logistic_regression.joblib
├── random_forest.joblib
├── svm_rbf_cuml_gpu.joblib
├── xgboost_gpu_optuna.joblib
```