zuazo
/

megconformer-speech-detection

speech-detection

Eval Results (legacy)

Model card Files Files and versions

megconformer-speech-detection / README.md

zuazo's picture

Add the LibriBrain holdout score

3f98265 verified about 2 months ago

|

history blame contribute delete

3.38 kB

	---
	license: bsd-3-clause
	tags:
	- meg
	- brain-signals
	- speech-detection
	- conformer
	- libribrain
	datasets:
	- pnpl/LibriBrain
	metrics:
	- f1
	library_name: pytorch

	model-index:
	- name: megconformer-speech-detection
	results:
	- task:
	type: audio-classification
	name: Speech classification
	dataset:
	name: LibriBrain 2025 PNPL (Standard track, speech task)
	type: pnpl/LibriBrain
	split: holdout
	metrics:
	- name: F1-macro
	type: f1
	value: 0.8890 # 88.90 %
	args:
	average: macro
	---

	# MEGConformer for Speech Detection

	Conformer-based MEG decoder for binary speech detection, trained with 10 different random seeds for reproducibility.

	## Model Performance

	\| Seed \| Val F1-Macro \| Checkpoint \|
	\|------\|--------------\|------------\|
	\| 0 (best) \| 87.06% \| `seed-0/pytorch_model.ckpt` \|
	\| 6 \| 86.80% \| `seed-6/pytorch_model.ckpt` \|
	\| 4 \| 86.62% \| `seed-4/pytorch_model.ckpt` \|
	\| 1 \| 86.54% \| `seed-1/pytorch_model.ckpt` \|
	\| 2 \| 86.37% \| `seed-2/pytorch_model.ckpt` \|
	\| 5 \| 86.29% \| `seed-5/pytorch_model.ckpt` \|
	\| 7 \| 86.18% \| `seed-7/pytorch_model.ckpt` \|
	\| 3 \| 86.13% \| `seed-3/pytorch_model.ckpt` \|
	\| 8 \| 85.92% \| `seed-8/pytorch_model.ckpt` \|
	\| 9 \| 85.18% \| `seed-9/pytorch_model.ckpt` \|

	- Holdout score of seed 0: 88.90%

	## Quick Start

	### Load Best Model
	```python
	import torch
	from huggingface_hub import hf_hub_download

	from libribrain_experiments.models.configurable_modules.classification_module import (
	ClassificationModule,
	)

	# Download a checkpoint (seed-0)
	checkpoint_path = hf_hub_download(
	repo_id="zuazo/megconformer-speech-detection", filename="seed-0/pytorch_model.ckpt"
	)

	# Choose device
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	# Load model and move to device
	model = ClassificationModule.load_from_checkpoint(checkpoint_path, map_location=device)
	model.eval()

	# Inference
	meg_signal = torch.randn(1, 306, 125, device=device) # Create directly on device

	with torch.no_grad():
	logits = model(meg_signal)
	prediction = torch.argmax(logits, dim=1) # 0=silence, 1=speech

	print(f"Prediction: {'Speech' if prediction.item() == 1 else 'Silence'}")
	```

	## Model Details

	- Architecture: Conformer Small
	- Hidden size: 144
	- FFN dim: 576
	- Layers: 16
	- Attention heads: 4
	- Depthwise conv kernel: 31
	- Input: 306-channel MEG signals
	- Window size: 2.5 seconds (625 samples at 250 Hz)
	- Output: Binary classification (silence/speech)
	- Training: [LibriBrain](https://huggingface.co/datasets/pnpl/LibriBrain) 2025 Standard track

	## Reproducibility

	All 10 random seeds are provided to ensure reproducibility.

	## Citation
	```bibtex
	@misc{dezuazo2025megconformerconformerbasedmegdecoder,
	title={MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification},
	author={Xabier de Zuazo and Ibon Saratxaga and Eva Navas},
	year={2025},
	eprint={2512.01443},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2512.01443},
	}
	```

	## License

	The 3-Clause BSD License

	## Links

	- Paper: [arXiv:2512.01443](https://arxiv.org/abs/2512.01443)
	- Code: [GitHub](https://github.com/neural2speech/libribrain-experiments)
	- Competition: [LibriBrain 2025](https://neural-processing-lab.github.io/2025-libribrain-competition/)