| | --- |
| | license: bsd-3-clause |
| | tags: |
| | - meg |
| | - brain-signals |
| | - phoneme-classification |
| | - conformer |
| | - libribrain |
| | - speech-recognition |
| | datasets: |
| | - pnpl/LibriBrain |
| | metrics: |
| | - f1 |
| | library_name: pytorch |
| |
|
| | model-index: |
| | - name: megconformer-phoneme-classification |
| | results: |
| | - task: |
| | type: audio-classification |
| | name: Phoneme classification |
| | dataset: |
| | name: LibriBrain 2025 PNPL (Standard track, phoneme task) |
| | type: pnpl/LibriBrain |
| | split: holdout |
| | metrics: |
| | - name: F1-macro |
| | type: f1 |
| | value: 0.6583 |
| | args: |
| | average: macro |
| | --- |
| | |
| | # MEGConformer for Phoneme Classification |
| |
|
| | Conformer-based MEG decoder for 39-class phoneme classification from ARPAbet phoneme set, trained with 5 different random seeds. |
| |
|
| | ## Model Performance |
| |
|
| | | Seed | Val F1-Macro | Checkpoint | |
| | |------|--------------|------------| |
| | | 7 (best) | **63.92%** | `seed-7/pytorch_model.ckpt` | |
| | | 18 | 63.86% | `seed-18/pytorch_model.ckpt` | |
| | | 17 | 58.74% | `seed-17/pytorch_model.ckpt` | |
| | | 1 | 58.64% | `seed-1/pytorch_model.ckpt` | |
| | | 2 | 58.10% | `seed-2/pytorch_model.ckpt` | |
| |
|
| | **Note:** Individual seeds were not evaluated on the holdout set. The ensemble of all 5 seeds achieved **65.8% F1-macro** on the competition holdout. |
| |
|
| | ## Quick Start |
| |
|
| | ### Single Model Inference |
| | ```python |
| | import torch |
| | from huggingface_hub import hf_hub_download |
| | |
| | from libribrain_experiments.models.configurable_modules.classification_module import ( |
| | ClassificationModule, |
| | ) |
| | |
| | # Download best checkpoint (seed-7) |
| | checkpoint_path = hf_hub_download( |
| | repo_id="zuazo/megconformer-phoneme-classification", |
| | filename="seed-7/pytorch_model.ckpt", |
| | ) |
| | |
| | # Choose device |
| | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| | |
| | # Load model |
| | model = ClassificationModule.load_from_checkpoint(checkpoint_path, map_location=device) |
| | model.eval() |
| | |
| | # Inference |
| | meg_signal = torch.randn(1, 306, 125, device=device) # (batch, channels, time) |
| | |
| | with torch.no_grad(): |
| | logits = model(meg_signal) |
| | probabilities = torch.softmax(logits, dim=1) |
| | prediction = torch.argmax(logits, dim=1) |
| | |
| | print(f"Predicted phoneme class: {prediction.item()}") |
| | print(f"Confidence: {probabilities[0, prediction].item():.2%}") |
| | ``` |
| |
|
| | ### Ensemble Inference (Recommended) |
| |
|
| | The ensemble approach averages predictions from all 5 seeds and achieves the best performance: |
| | ```python |
| | import torch |
| | from huggingface_hub import hf_hub_download |
| | |
| | from libribrain_experiments.models.configurable_modules.classification_module import ( |
| | ClassificationModule, |
| | ) |
| | |
| | # Choose device |
| | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| | |
| | # Load all available seeds (as in the paper) |
| | seeds = [7, 18, 17, 1, 2] |
| | models = [] |
| | |
| | for seed in seeds: |
| | checkpoint_path = hf_hub_download( |
| | repo_id="zuazo/megconformer-phoneme-classification", |
| | filename=f"seed-{seed}/pytorch_model.ckpt", |
| | ) |
| | model = ClassificationModule.load_from_checkpoint( |
| | checkpoint_path, map_location=device |
| | ) |
| | model.eval().to(device) |
| | models.append(model) |
| | |
| | # Example MEG input: (batch=1, channels=306, time=125) |
| | meg_signal = torch.randn(1, 306, 125, device=device) |
| | |
| | with torch.no_grad(): |
| | probs_list = [] |
| | preds_list = [] |
| | |
| | for model in models: |
| | logits = model(meg_signal) # (1, C) |
| | probs = torch.softmax(logits, dim=1) # (1, C) |
| | probs_list.append(probs) |
| | preds_list.append(probs.argmax(dim=1)) # (1,) |
| | |
| | # Stack predictions from all models: shape (num_models, batch_size) |
| | preds = torch.stack(preds_list, dim=0) # (M, 1) |
| | |
| | # We have a single example in the batch, so index 0 |
| | per_model_preds = preds[:, 0] # (M,) |
| | |
| | num_classes = probs_list[0].size(1) |
| | # Count votes per class |
| | votes = torch.bincount(per_model_preds, minlength=num_classes).float() |
| | |
| | # Majority-vote class (ties resolved by smallest index) |
| | majority_class = int(votes.argmax().item()) |
| | |
| | # "Confidence" = fraction of models voting for the chosen class |
| | confidence = (votes[majority_class] / votes.sum()).item() |
| | |
| | print(f"Ensemble (majority vote) predicted phoneme class: {majority_class}") |
| | print(f"Vote share for that class: {confidence:.2%}") |
| | ``` |
| |
|
| | ## Model Details |
| |
|
| | - **Architecture**: Conformer (custom size) |
| | - Hidden size: 256 |
| | - FFN dim: 2048 |
| | - Layers: 7 |
| | - Attention heads: 12 |
| | - Depthwise conv kernel: 31 |
| | - **Input**: 306-channel MEG signals |
| | - **Window size**: 0.5 seconds (125 samples at 250 Hz) |
| | - **Output**: 39-class phoneme classification (ARPAbet phoneme set) |
| | - **Training**: [LibriBrain](https://huggingface.co/datasets/pnpl/LibriBrain) 2025 Standard track |
| | - **Grouping**: 100 single-trial examples averaged per training sample |
| |
|
| | ## Reproducibility |
| |
|
| | All 5 random seeds are provided. For best results on new data, we recommend using the ensemble approach, which achieved **65.8% F1-macro** on the competition holdout set. |
| |
|
| | ## Citation |
| | ```bibtex |
| | @misc{dezuazo2025megconformerconformerbasedmegdecoder, |
| | title={MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification}, |
| | author={Xabier de Zuazo and Ibon Saratxaga and Eva Navas}, |
| | year={2025}, |
| | eprint={2512.01443}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CL}, |
| | url={https://arxiv.org/abs/2512.01443}, |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | The 3-Clause BSD License |
| |
|
| | ## Links |
| |
|
| | - **Paper**: [arXiv:2512.01443](https://arxiv.org/abs/2512.01443) |
| | - **Code**: [GitHub](https://github.com/neural2speech/libribrain-experiments) |
| | - **Competition**: [LibriBrain 2025](https://neural-processing-lab.github.io/2025-libribrain-competition/) |