LeBabyOx's picture
Create README.md
d721840 verified
metadata
license: mit
datasets:
  - LeBabyOx/EEGParquet
language:
  - en
metrics:
  - accuracy
  - f1
  - precision
  - recall
  - roc_auc
pipeline_tag: tabular-classification
library_name: sklearn
tags:
  - eeg
  - seizure-detection
  - biomedical
  - time-series
  - imbalanced-data
  - healthcare
  - classical-ml

🧠 Key Insights

  • Tree-based models (RF, XGBoost) fail under extreme imbalance, predicting only the majority class.
  • Linear models achieve high recall but suffer from extremely low precision.
  • Threshold tuning significantly improves performance:
    • F1 improved from 0.0085 β†’ 0.0769 (LogReg)

βš™οΈ Usage

import joblib

model = joblib.load("models/logistic_regression.joblib")

preds = model.predict(X)

⚠️ Limitations

Models struggle with extreme imbalance (~1600:1) Poor generalization across subjects (LOSO results) Classical ML is insufficient for robust seizure detection in this setting

πŸ“š Citation

If you use this model, please cite:

@dataset{eegparquet_benchmark_2026,
  title={EEGParquet-Benchmark: Windowed and Feature-Enriched EEG Dataset for Seizure Detection},
  author={Daffa Tarigan},
  year={2026},
  publisher={Hugging Face}
}

πŸš€ Notes

This repository is intended for:

Benchmarking classical ML under imbalance Demonstrating limitations of accuracy-based evaluation Supporting research in biomedical signal classification


1. Folder structure (important)

/models
β”œβ”€β”€ logistic_regression.joblib
β”œβ”€β”€ random_forest.joblib
β”œβ”€β”€ svm_rbf_cuml_gpu.joblib
β”œβ”€β”€ xgboost_gpu_optuna.joblib