Create README.md

d721840 verified 11 days ago

1.59 kB

license: mit
datasets:
  - LeBabyOx/EEGParquet
language:
  - en
metrics:
  - accuracy
  - f1
  - precision
  - recall
  - roc_auc
pipeline_tag: tabular-classification
library_name: sklearn
tags:
  - eeg
  - seizure-detection
  - biomedical
  - time-series
  - imbalanced-data
  - healthcare
  - classical-ml

🧠 Key Insights

Tree-based models (RF, XGBoost) fail under extreme imbalance, predicting only the majority class.
Linear models achieve high recall but suffer from extremely low precision.
Threshold tuning significantly improves performance:
- F1 improved from 0.0085 → 0.0769 (LogReg)

⚙️ Usage

import joblib

model = joblib.load("models/logistic_regression.joblib")

preds = model.predict(X)

⚠️ Limitations

Models struggle with extreme imbalance (~1600:1) Poor generalization across subjects (LOSO results) Classical ML is insufficient for robust seizure detection in this setting

📚 Citation

If you use this model, please cite:

@dataset{eegparquet_benchmark_2026,
  title={EEGParquet-Benchmark: Windowed and Feature-Enriched EEG Dataset for Seizure Detection},
  author={Daffa Tarigan},
  year={2026},
  publisher={Hugging Face}
}

🚀 Notes

This repository is intended for:

Benchmarking classical ML under imbalance Demonstrating limitations of accuracy-based evaluation Supporting research in biomedical signal classification

1. Folder structure (important)

/models
├── logistic_regression.joblib
├── random_forest.joblib
├── svm_rbf_cuml_gpu.joblib
├── xgboost_gpu_optuna.joblib