Data splits (train/val/test)

#1
by PierreGtch - opened
Braindecode org
โ€ข
edited 12 days ago

For the first iteration of this benchmark, we employ predefined train/validation/test splits, mostly focused on assessing generalization to unseen subjects. We followed practices established in recent foundation model papers (REVE, CBraMod and Labram), where subjects are stratified across splits, except for datasets SEED-V and BCIC-2020-3 where task difficulty would make strict cross-subject transfer impossible; instead, we used the within-session split provided in the original publications.
The exact split details are the following:

  • MAT (mental stress) Train subjects 0โ€“27; validation subjects 28โ€“31; test subjects 32โ€“35.
  • FACED (emotion recognition) Train subjects 0โ€“79; validation subjects 80โ€“99; test subjects 100โ€“122.
  • PhysioNet-MI (motor imagery) Train subjects 1โ€“70; validation subjects 71โ€“89; test subjects 90โ€“109.
  • BCIC-2020-3 (imagined speech) Within-session split by recording run, according to the original competition rules: train = run 0; validation = run 1; test = run 2.
  • ISRUC (sleep staging) Train = I001โ€“I080 (group I, with a small number of exclusions noted in the dataset config); validation = I081โ€“I090; test = I091โ€“I100.
  • BCIC-IV-2a (motor imagery) Train subjects: 1, 2, 3; validation subjects: 4, 5, 6; test subjects: 7, 8, 9.
  • SEED-V (emotion recognition) Session-based split: train = session 1; validation = session 2; test = session 3.
  • TUAB (abnormal events detection) Original split: subjects labeled 'train=true' are used for training; subjects labeled 'train=false' are used for testing. The validation set is obtained from the training records via a nested cross-subject split (5 folds, approximately 80%/20% per fold, stratified).
  • Mumtaz (mental disorder) Predefined subject lists are used: training subjects: H1, H2, H10โ€“22, MDD1, MDD2, MDD10โ€“21; validation subjects: H23โ€“25, MDD22โ€“25; test subjects: H3โ€“9, H26โ€“30, MDD3โ€“9, MDD26โ€“34.
  • TUEV (event classification) Original split: subjects labeled 'train' are used for training and 'eval' for testing. The validation set is derived from the 'train' group using a nested cross-subject split (5 folds; each fold splits training subjects approximately 80%/20%).

Feedback is highly welcome before we freeze the benchmark design decisions :)

PierreGtch pinned discussion

Sign up or log in to comment