Spaces:
Running
Running
Data splits (train/val/test)
#1
pinned
by
PierreGtch - opened
For the first iteration of this benchmark, we employ predefined train/validation/test splits, mostly focused on assessing generalization to unseen subjects. We followed practices established in recent foundation model papers (REVE, CBraMod and Labram), where subjects are stratified across splits, except for datasets SEED-V and BCIC-2020-3 where task difficulty would make strict cross-subject transfer impossible; instead, we used the within-session split provided in the original publications.
The exact split details are the following:
- MAT (mental stress) Train subjects 0โ27; validation subjects 28โ31; test subjects 32โ35.
- FACED (emotion recognition) Train subjects 0โ79; validation subjects 80โ99; test subjects 100โ122.
- PhysioNet-MI (motor imagery) Train subjects 1โ70; validation subjects 71โ89; test subjects 90โ109.
- BCIC-2020-3 (imagined speech) Within-session split by recording run, according to the original competition rules: train = run 0; validation = run 1; test = run 2.
- ISRUC (sleep staging) Train = I001โI080 (group I, with a small number of exclusions noted in the dataset config); validation = I081โI090; test = I091โI100.
- BCIC-IV-2a (motor imagery) Train subjects: 1, 2, 3; validation subjects: 4, 5, 6; test subjects: 7, 8, 9.
- SEED-V (emotion recognition) Session-based split: train = session 1; validation = session 2; test = session 3.
- TUAB (abnormal events detection) Original split: subjects labeled 'train=true' are used for training; subjects labeled 'train=false' are used for testing. The validation set is obtained from the training records via a nested cross-subject split (5 folds, approximately 80%/20% per fold, stratified).
- Mumtaz (mental disorder) Predefined subject lists are used: training subjects: H1, H2, H10โ22, MDD1, MDD2, MDD10โ21; validation subjects: H23โ25, MDD22โ25; test subjects: H3โ9, H26โ30, MDD3โ9, MDD26โ34.
- TUEV (event classification) Original split: subjects labeled 'train' are used for training and 'eval' for testing. The validation set is derived from the 'train' group using a nested cross-subject split (5 folds; each fold splits training subjects approximately 80%/20%).
Feedback is highly welcome before we freeze the benchmark design decisions :)
PierreGtch pinned discussion