zuazo
/

megconformer-phoneme-classification

+---
+license: bsd-3-clause
+tags:
+- meg
+- brain-signals
+- phoneme-classification
+- conformer
+- libribrain
+- speech-recognition
+datasets:
+- pnpl/LibriBrain
+metrics:
+- f1
+library_name: pytorch
+model-index:
+- name: megconformer-phoneme-classification
+  results:
+  - task:
+      type: audio-classification
+      name: Phoneme classification
+    dataset:
+      name: LibriBrain 2025 PNPL (Standard track, phoneme task)
+      type: pnpl/LibriBrain
+      split: holdout
+    metrics:
+    - name: F1-macro
+      type: f1
+      value: 0.6583   # 65.83 %
+      args:
+        average: macro
+---
+# MEGConformer for Phoneme Classification
+Conformer-based MEG decoder for 39-class phoneme classification from ARPAbet phoneme set, trained with 5 different random seeds.
+## Model Performance
+| Seed | Val F1-Macro | Checkpoint |
+|------|--------------|------------|
+| 7 (best) | **63.92%** | `seed-7/pytorch_model.ckpt` |
+| 18 | 63.86% | `seed-18/pytorch_model.ckpt` |
+| 17 | 58.74% | `seed-17/pytorch_model.ckpt` |
+| 1 | 58.64% | `seed-1/pytorch_model.ckpt` |
+| 2 | 58.10% | `seed-2/pytorch_model.ckpt` |
+**Note:** Individual seeds were not evaluated on the holdout set. The ensemble of all 5 seeds achieved **65.8% F1-macro** on the competition holdout.
+## Quick Start
+### Single Model Inference
+```python
+import torch
+from huggingface_hub import hf_hub_download
+from libribrain_experiments.models.configurable_modules.classification_module import (
+    ClassificationModule,
+)
+# Download best checkpoint (seed-7)
+checkpoint_path = hf_hub_download(
+    repo_id="zuazo/megconformer-phoneme-classification",
+    filename="seed-7/pytorch_model.ckpt",
+)
+# Choose device
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# Load model
+model = ClassificationModule.load_from_checkpoint(checkpoint_path, map_location=device)
+model.eval()
+# Inference
+meg_signal = torch.randn(1, 306, 125, device=device)  # (batch, channels, time)
+with torch.no_grad():
+    logits = model(meg_signal)
+    probabilities = torch.softmax(logits, dim=1)
+    prediction = torch.argmax(logits, dim=1)
+print(f"Predicted phoneme class: {prediction.item()}")
+print(f"Confidence: {probabilities[0, prediction].item():.2%}")
+```
+### Ensemble Inference (Recommended)
+The ensemble approach averages predictions from all 5 seeds and achieves the best performance:
+```python
+import torch
+from huggingface_hub import hf_hub_download
+from libribrain_experiments.models.configurable_modules.classification_module import (
+    ClassificationModule,
+)
+# Choose device
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# Load all available seeds (as in the paper)
+seeds = [7, 18, 17, 1, 2]
+models = []
+for seed in seeds:
+    checkpoint_path = hf_hub_download(
+        repo_id="zuazo/megconformer-phoneme-classification",
+        filename=f"seed-{seed}/pytorch_model.ckpt",
+    )
+    model = ClassificationModule.load_from_checkpoint(
+        checkpoint_path, map_location=device
+    )
+    model.eval().to(device)
+    models.append(model)
+# Example MEG input: (batch=1, channels=306, time=125)
+meg_signal = torch.randn(1, 306, 125, device=device)
+with torch.no_grad():
+    probs_list = []
+    preds_list = []
+    for model in models:
+        logits = model(meg_signal)  # (1, C)
+        probs = torch.softmax(logits, dim=1)  # (1, C)
+        probs_list.append(probs)
+        preds_list.append(probs.argmax(dim=1))  # (1,)
+    # Stack predictions from all models: shape (num_models, batch_size)
+    preds = torch.stack(preds_list, dim=0)  # (M, 1)
+    # We have a single example in the batch, so index 0
+    per_model_preds = preds[:, 0]  # (M,)
+    num_classes = probs_list[0].size(1)
+    # Count votes per class
+    votes = torch.bincount(per_model_preds, minlength=num_classes).float()
+    # Majority-vote class (ties resolved by smallest index)
+    majority_class = int(votes.argmax().item())
+    # "Confidence" = fraction of models voting for the chosen class
+    confidence = (votes[majority_class] / votes.sum()).item()
+print(f"Ensemble (majority vote) predicted phoneme class: {majority_class}")
+print(f"Vote share for that class: {confidence:.2%}")
+```
+## Model Details
+- **Architecture**: Conformer (custom size)
+  - Hidden size: 256
+  - FFN dim: 2048
+  - Layers: 7
+  - Attention heads: 12
+  - Depthwise conv kernel: 31
+- **Input**: 306-channel MEG signals
+- **Window size**: 0.5 seconds (125 samples at 250 Hz)
+- **Output**: 39-class phoneme classification (ARPAbet phoneme set)
+- **Training**: [LibriBrain](https://huggingface.co/datasets/pnpl/LibriBrain) 2025 Standard track
+- **Grouping**: 100 single-trial examples averaged per training sample
+## Reproducibility
+All 5 random seeds are provided. For best results on new data, we recommend using the ensemble approach, which achieved **65.8% F1-macro** on the competition holdout set.
+## Citation
+```bibtex
+@misc{dezuazo2025megconformerconformerbasedmegdecoder,
+      title={MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification},
+      author={Xabier de Zuazo and Ibon Saratxaga and Eva Navas},
+      year={2025},
+      eprint={2512.01443},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.01443},
+}
+```
+## License
+The 3-Clause BSD License
+## Links
+- **Paper**: [arXiv:2512.01443](https://arxiv.org/abs/2512.01443)
+- **Code**: [GitHub](https://github.com/neural2speech/libribrain-experiments)
+- **Competition**: [LibriBrain 2025](https://neural-processing-lab.github.io/2025-libribrain-competition/)