Create README.md

dfbb306 verified 3 months ago

5.56 kB

	---
	license: bsd-3-clause
	tags:
	- meg
	- brain-signals
	- phoneme-classification
	- conformer
	- libribrain
	- speech-recognition
	datasets:
	- pnpl/LibriBrain
	metrics:
	- f1
	library_name: pytorch

	model-index:
	- name: megconformer-phoneme-classification
	results:
	- task:
	type: audio-classification
	name: Phoneme classification
	dataset:
	name: LibriBrain 2025 PNPL (Standard track, phoneme task)
	type: pnpl/LibriBrain
	split: holdout
	metrics:
	- name: F1-macro
	type: f1
	value: 0.6583 # 65.83 %
	args:
	average: macro
	---

	# MEGConformer for Phoneme Classification

	Conformer-based MEG decoder for 39-class phoneme classification from ARPAbet phoneme set, trained with 5 different random seeds.

	## Model Performance

	\| Seed \| Val F1-Macro \| Checkpoint \|
	\|------\|--------------\|------------\|
	\| 7 (best) \| 63.92% \| `seed-7/pytorch_model.ckpt` \|
	\| 18 \| 63.86% \| `seed-18/pytorch_model.ckpt` \|
	\| 17 \| 58.74% \| `seed-17/pytorch_model.ckpt` \|
	\| 1 \| 58.64% \| `seed-1/pytorch_model.ckpt` \|
	\| 2 \| 58.10% \| `seed-2/pytorch_model.ckpt` \|

	Note: Individual seeds were not evaluated on the holdout set. The ensemble of all 5 seeds achieved 65.8% F1-macro on the competition holdout.

	## Quick Start

	### Single Model Inference
	```python
	import torch
	from huggingface_hub import hf_hub_download

	from libribrain_experiments.models.configurable_modules.classification_module import (
	ClassificationModule,
	)

	# Download best checkpoint (seed-7)
	checkpoint_path = hf_hub_download(
	repo_id="zuazo/megconformer-phoneme-classification",
	filename="seed-7/pytorch_model.ckpt",
	)

	# Choose device
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	# Load model
	model = ClassificationModule.load_from_checkpoint(checkpoint_path, map_location=device)
	model.eval()

	# Inference
	meg_signal = torch.randn(1, 306, 125, device=device) # (batch, channels, time)

	with torch.no_grad():
	logits = model(meg_signal)
	probabilities = torch.softmax(logits, dim=1)
	prediction = torch.argmax(logits, dim=1)

	print(f"Predicted phoneme class: {prediction.item()}")
	print(f"Confidence: {probabilities[0, prediction].item():.2%}")
	```

	### Ensemble Inference (Recommended)

	The ensemble approach averages predictions from all 5 seeds and achieves the best performance:
	```python
	import torch
	from huggingface_hub import hf_hub_download

	from libribrain_experiments.models.configurable_modules.classification_module import (
	ClassificationModule,
	)

	# Choose device
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	# Load all available seeds (as in the paper)
	seeds = [7, 18, 17, 1, 2]
	models = []

	for seed in seeds:
	checkpoint_path = hf_hub_download(
	repo_id="zuazo/megconformer-phoneme-classification",
	filename=f"seed-{seed}/pytorch_model.ckpt",
	)
	model = ClassificationModule.load_from_checkpoint(
	checkpoint_path, map_location=device
	)
	model.eval().to(device)
	models.append(model)

	# Example MEG input: (batch=1, channels=306, time=125)
	meg_signal = torch.randn(1, 306, 125, device=device)

	with torch.no_grad():
	probs_list = []
	preds_list = []

	for model in models:
	logits = model(meg_signal) # (1, C)
	probs = torch.softmax(logits, dim=1) # (1, C)
	probs_list.append(probs)
	preds_list.append(probs.argmax(dim=1)) # (1,)

	# Stack predictions from all models: shape (num_models, batch_size)
	preds = torch.stack(preds_list, dim=0) # (M, 1)

	# We have a single example in the batch, so index 0
	per_model_preds = preds[:, 0] # (M,)

	num_classes = probs_list[0].size(1)
	# Count votes per class
	votes = torch.bincount(per_model_preds, minlength=num_classes).float()

	# Majority-vote class (ties resolved by smallest index)
	majority_class = int(votes.argmax().item())

	# "Confidence" = fraction of models voting for the chosen class
	confidence = (votes[majority_class] / votes.sum()).item()

	print(f"Ensemble (majority vote) predicted phoneme class: {majority_class}")
	print(f"Vote share for that class: {confidence:.2%}")
	```

	## Model Details

	- Architecture: Conformer (custom size)
	- Hidden size: 256
	- FFN dim: 2048
	- Layers: 7
	- Attention heads: 12
	- Depthwise conv kernel: 31
	- Input: 306-channel MEG signals
	- Window size: 0.5 seconds (125 samples at 250 Hz)
	- Output: 39-class phoneme classification (ARPAbet phoneme set)
	- Training: [LibriBrain](https://huggingface.co/datasets/pnpl/LibriBrain) 2025 Standard track
	- Grouping: 100 single-trial examples averaged per training sample

	## Reproducibility

	All 5 random seeds are provided. For best results on new data, we recommend using the ensemble approach, which achieved 65.8% F1-macro on the competition holdout set.

	## Citation
	```bibtex
	@misc{dezuazo2025megconformerconformerbasedmegdecoder,
	title={MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification},
	author={Xabier de Zuazo and Ibon Saratxaga and Eva Navas},
	year={2025},
	eprint={2512.01443},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2512.01443},
	}
	```

	## License

	The 3-Clause BSD License

	## Links

	- Paper: [arXiv:2512.01443](https://arxiv.org/abs/2512.01443)
	- Code: [GitHub](https://github.com/neural2speech/libribrain-experiments)
	- Competition: [LibriBrain 2025](https://neural-processing-lab.github.io/2025-libribrain-competition/)