--- license: mit datasets: - LeBabyOx/EEGParquet language: - en metrics: - accuracy - f1 - precision - recall - roc_auc pipeline_tag: tabular-classification library_name: sklearn tags: - eeg - seizure-detection - biomedical - time-series - imbalanced-data - healthcare - classical-ml --- ## 🧠 Key Insights - Tree-based models (RF, XGBoost) fail under extreme imbalance, predicting only the majority class. - Linear models achieve high recall but suffer from extremely low precision. - Threshold tuning significantly improves performance: - F1 improved from 0.0085 → 0.0769 (LogReg) --- ## ⚙️ Usage ```python import joblib model = joblib.load("models/logistic_regression.joblib") preds = model.predict(X) ``` ## ⚠️ Limitations Models struggle with extreme imbalance (~1600:1) Poor generalization across subjects (LOSO results) Classical ML is insufficient for robust seizure detection in this setting ## 📚 Citation If you use this model, please cite: ``` @dataset{eegparquet_benchmark_2026, title={EEGParquet-Benchmark: Windowed and Feature-Enriched EEG Dataset for Seizure Detection}, author={Daffa Tarigan}, year={2026}, publisher={Hugging Face} } ``` ## 🚀 Notes This repository is intended for: Benchmarking classical ML under imbalance Demonstrating limitations of accuracy-based evaluation Supporting research in biomedical signal classification --- ## 1. Folder structure (important) ``` /models ├── logistic_regression.joblib ├── random_forest.joblib ├── svm_rbf_cuml_gpu.joblib ├── xgboost_gpu_optuna.joblib ```