zuazo commited on
Commit
dfbb306
·
verified ·
1 Parent(s): ef9a417

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +187 -0
README.md ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: bsd-3-clause
3
+ tags:
4
+ - meg
5
+ - brain-signals
6
+ - phoneme-classification
7
+ - conformer
8
+ - libribrain
9
+ - speech-recognition
10
+ datasets:
11
+ - pnpl/LibriBrain
12
+ metrics:
13
+ - f1
14
+ library_name: pytorch
15
+
16
+ model-index:
17
+ - name: megconformer-phoneme-classification
18
+ results:
19
+ - task:
20
+ type: audio-classification
21
+ name: Phoneme classification
22
+ dataset:
23
+ name: LibriBrain 2025 PNPL (Standard track, phoneme task)
24
+ type: pnpl/LibriBrain
25
+ split: holdout
26
+ metrics:
27
+ - name: F1-macro
28
+ type: f1
29
+ value: 0.6583 # 65.83 %
30
+ args:
31
+ average: macro
32
+ ---
33
+
34
+ # MEGConformer for Phoneme Classification
35
+
36
+ Conformer-based MEG decoder for 39-class phoneme classification from ARPAbet phoneme set, trained with 5 different random seeds.
37
+
38
+ ## Model Performance
39
+
40
+ | Seed | Val F1-Macro | Checkpoint |
41
+ |------|--------------|------------|
42
+ | 7 (best) | **63.92%** | `seed-7/pytorch_model.ckpt` |
43
+ | 18 | 63.86% | `seed-18/pytorch_model.ckpt` |
44
+ | 17 | 58.74% | `seed-17/pytorch_model.ckpt` |
45
+ | 1 | 58.64% | `seed-1/pytorch_model.ckpt` |
46
+ | 2 | 58.10% | `seed-2/pytorch_model.ckpt` |
47
+
48
+ **Note:** Individual seeds were not evaluated on the holdout set. The ensemble of all 5 seeds achieved **65.8% F1-macro** on the competition holdout.
49
+
50
+ ## Quick Start
51
+
52
+ ### Single Model Inference
53
+ ```python
54
+ import torch
55
+ from huggingface_hub import hf_hub_download
56
+
57
+ from libribrain_experiments.models.configurable_modules.classification_module import (
58
+ ClassificationModule,
59
+ )
60
+
61
+ # Download best checkpoint (seed-7)
62
+ checkpoint_path = hf_hub_download(
63
+ repo_id="zuazo/megconformer-phoneme-classification",
64
+ filename="seed-7/pytorch_model.ckpt",
65
+ )
66
+
67
+ # Choose device
68
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
69
+
70
+ # Load model
71
+ model = ClassificationModule.load_from_checkpoint(checkpoint_path, map_location=device)
72
+ model.eval()
73
+
74
+ # Inference
75
+ meg_signal = torch.randn(1, 306, 125, device=device) # (batch, channels, time)
76
+
77
+ with torch.no_grad():
78
+ logits = model(meg_signal)
79
+ probabilities = torch.softmax(logits, dim=1)
80
+ prediction = torch.argmax(logits, dim=1)
81
+
82
+ print(f"Predicted phoneme class: {prediction.item()}")
83
+ print(f"Confidence: {probabilities[0, prediction].item():.2%}")
84
+ ```
85
+
86
+ ### Ensemble Inference (Recommended)
87
+
88
+ The ensemble approach averages predictions from all 5 seeds and achieves the best performance:
89
+ ```python
90
+ import torch
91
+ from huggingface_hub import hf_hub_download
92
+
93
+ from libribrain_experiments.models.configurable_modules.classification_module import (
94
+ ClassificationModule,
95
+ )
96
+
97
+ # Choose device
98
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
99
+
100
+ # Load all available seeds (as in the paper)
101
+ seeds = [7, 18, 17, 1, 2]
102
+ models = []
103
+
104
+ for seed in seeds:
105
+ checkpoint_path = hf_hub_download(
106
+ repo_id="zuazo/megconformer-phoneme-classification",
107
+ filename=f"seed-{seed}/pytorch_model.ckpt",
108
+ )
109
+ model = ClassificationModule.load_from_checkpoint(
110
+ checkpoint_path, map_location=device
111
+ )
112
+ model.eval().to(device)
113
+ models.append(model)
114
+
115
+ # Example MEG input: (batch=1, channels=306, time=125)
116
+ meg_signal = torch.randn(1, 306, 125, device=device)
117
+
118
+ with torch.no_grad():
119
+ probs_list = []
120
+ preds_list = []
121
+
122
+ for model in models:
123
+ logits = model(meg_signal) # (1, C)
124
+ probs = torch.softmax(logits, dim=1) # (1, C)
125
+ probs_list.append(probs)
126
+ preds_list.append(probs.argmax(dim=1)) # (1,)
127
+
128
+ # Stack predictions from all models: shape (num_models, batch_size)
129
+ preds = torch.stack(preds_list, dim=0) # (M, 1)
130
+
131
+ # We have a single example in the batch, so index 0
132
+ per_model_preds = preds[:, 0] # (M,)
133
+
134
+ num_classes = probs_list[0].size(1)
135
+ # Count votes per class
136
+ votes = torch.bincount(per_model_preds, minlength=num_classes).float()
137
+
138
+ # Majority-vote class (ties resolved by smallest index)
139
+ majority_class = int(votes.argmax().item())
140
+
141
+ # "Confidence" = fraction of models voting for the chosen class
142
+ confidence = (votes[majority_class] / votes.sum()).item()
143
+
144
+ print(f"Ensemble (majority vote) predicted phoneme class: {majority_class}")
145
+ print(f"Vote share for that class: {confidence:.2%}")
146
+ ```
147
+
148
+ ## Model Details
149
+
150
+ - **Architecture**: Conformer (custom size)
151
+ - Hidden size: 256
152
+ - FFN dim: 2048
153
+ - Layers: 7
154
+ - Attention heads: 12
155
+ - Depthwise conv kernel: 31
156
+ - **Input**: 306-channel MEG signals
157
+ - **Window size**: 0.5 seconds (125 samples at 250 Hz)
158
+ - **Output**: 39-class phoneme classification (ARPAbet phoneme set)
159
+ - **Training**: [LibriBrain](https://huggingface.co/datasets/pnpl/LibriBrain) 2025 Standard track
160
+ - **Grouping**: 100 single-trial examples averaged per training sample
161
+
162
+ ## Reproducibility
163
+
164
+ All 5 random seeds are provided. For best results on new data, we recommend using the ensemble approach, which achieved **65.8% F1-macro** on the competition holdout set.
165
+
166
+ ## Citation
167
+ ```bibtex
168
+ @misc{dezuazo2025megconformerconformerbasedmegdecoder,
169
+ title={MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification},
170
+ author={Xabier de Zuazo and Ibon Saratxaga and Eva Navas},
171
+ year={2025},
172
+ eprint={2512.01443},
173
+ archivePrefix={arXiv},
174
+ primaryClass={cs.CL},
175
+ url={https://arxiv.org/abs/2512.01443},
176
+ }
177
+ ```
178
+
179
+ ## License
180
+
181
+ The 3-Clause BSD License
182
+
183
+ ## Links
184
+
185
+ - **Paper**: [arXiv:2512.01443](https://arxiv.org/abs/2512.01443)
186
+ - **Code**: [GitHub](https://github.com/neural2speech/libribrain-experiments)
187
+ - **Competition**: [LibriBrain 2025](https://neural-processing-lab.github.io/2025-libribrain-competition/)