Add architecture-only model card

2e2923c verified 26 days ago

14.9 kB

	---
	license: bsd-3-clause
	library_name: braindecode
	pipeline_tag: feature-extraction
	tags:
	- eeg
	- biosignal
	- pytorch
	- neuroscience
	- braindecode
	- convolutional
	---

	# BrainModule

	BrainModule from , also known as SimpleConv.

	> Architecture-only repository. This repo documents the
	> `braindecode.models.BrainModule` class. **No pretrained weights are
	> distributed here** — instantiate the model and train it on your own
	> data, or fine-tune from a published foundation-model checkpoint
	> separately.

	## Quick start

	```bash
	pip install braindecode
	```

	```python
	from braindecode.models import BrainModule

	model = BrainModule(
	n_chans=22,
	sfreq=250,
	input_window_seconds=4.0,
	n_outputs=4,
	)
	```

	The signal-shape arguments above are example defaults — adjust them
	to match your recording.

	## Documentation

	- Full API reference (parameters, references, architecture figure):
	<https://braindecode.org/stable/generated/braindecode.models.BrainModule.html>
	- Interactive browser with live instantiation:
	<https://huggingface.co/spaces/braindecode/model-explorer>
	- Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/brainmodule.py#L25>

	## Architecture description

	The block below is the rendered class docstring (parameters,
	references, architecture figure where available).

	<div class='bd-doc'><main>
	<p>BrainModule from [brainmagick]_, also known as SimpleConv.</p>
	<blockquote>
	<p>A dilated convolutional encoder for EEG decoding, using residual
	connections and optional GLU gating for improved expressivity.</p>
	</blockquote>
	<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#5cb85c;color:white;font-size:11px;font-weight:600;margin-right:4px;">Convolution</span>



	.. figure:: ../_static/model/simpleconv.png
	:align: center
	:alt: BrainModule Architecture
	:width: 500px

	Figure adapted Extended Data Fig. 4 from [brainmagick]_ to highlight only the model part.
	Architecture of the brain module. Architecture used to process the brain recordings.
	For each layer, the authors note first the number of output channels, while the number of time steps
	is constant throughout the layers. The model is composed of a spatial attention layer,
	then a 1x1 convolution without activation. A 'Subject Layer' is selected based on the subject index s,
	which consists in a 1x1 convolution learnt only for that subject with no activation. Then,
	the authors apply five convolutional blocks made of three convolutions. The first
	two use residual skip connection and increasing dilation, followed by a BatchNorm layer and a
	GELU activation. The third convolution is not residual, and uses a GLU activation
	(which halves the number of channels) and no normalization.
	Finally, the authors apply two 1x1 convolutions with a GELU in between.

	The BrainModule (also referred to as SimpleConv) is a deep dilated
	convolutional encoder specifically designed to decode perceived speech from
	non-invasive brain recordings like EEG and MEG. It is engineered to address
	the high noise levels and inter-individual variability inherent in
	non-invasive neuroimaging by using a single architecture trained across
	large cohorts while accommodating participant-specific differences.

	.. rubric:: Architecture Overview

	The BrainModule integrates three primary mechanisms to align brain activity
	with deep speech representations:

	1. Spatial-temporal feature extraction. The model uses a dedicated
	spatial attention layer to remap sensor data based on physical
	locations, followed by temporal processing through dilated convolutions.
	2. Subject-specific adaptation. To leverage inter-subject variability,
	the architecture includes a "Subject Layer" or participant-specific
	1x1 convolution that allows the model to share core weights across a
	cohort while learning individual-specific neural patterns.
	3. Dilated residual blocks with gating. The core encoder employs a
	stack of convolutional blocks featuring skip connections and increasing
	dilation to expand the receptive field without losing temporal
	resolution, supplemented by optional Gated Linear Units (GLU) for
	increased expressivity.

	.. rubric:: Macro Components

	``BrainModule.input_projection`` (Initial Processing)
	Operations. Raw M/EEG input
	:math:`\mathbf{X} \in \mathbb{R}^{C \times T}` is first processed
	through a spatial attention layer that projects sensor locations onto a
	2D plane using Fourier-parameterized functions. This is followed by a
	subject-specific 1x1 convolution
	:math:`\mathbf{M}_s \in \mathbb{R}^{D_1 \times D_1}` if subject
	features are enabled. The resulting features are projected to the
	``hidden_dim`` (default 320) to ensure compatibility with subsequent
	residual connections.

	Role. Converts high-dimensional, subject-dependent sensor data into
	a standardized latent space while preserving spatial and temporal
	relationships.

	``BrainModule.encoder`` (Convolutional Sequence)
	Operations. Implemented via
	:class:`~braindecode.models.brainmodule._ConvSequence`, this component
	consists of a stack of ``k`` convolutional blocks. Each block typically
	contains: (a) Residual dilated convolutions. Two layers with kernel
	size 3, residual skip connections, and dilation factors that grow
	exponentially (e.g., powers of two with periodic resets) to capture
	multi-scale temporal context. (b) GLU gating. Every ``N`` layers
	(defined by ``glu``), a Gated Linear Unit is applied, which halves the
	channel dimension and introduces non-linear gating to filter
	intermediate representations.

	Role. Extracts deep hierarchical temporal features from the brain
	signal, significantly expanding the model's receptive field to align
	with the contextual windows of speech modules like wav2vec 2.0.

	.. rubric:: Temporal, Spatial, and Spectral Encoding

	- Temporal: Increasing dilation factors across layers allow the model to
	integrate information over large time windows without the computational
	cost of standard large kernels, while a 150 ms input shift facilitates
	alignment between stimulus and brain response.
	- Spatial: The spatial attention layer learns a softmax weighting over
	input sensors based on their 3D coordinates, allowing the model to focus
	on regions typically activated during auditory stimulation (e.g., the
	temporal cortex).
	- Spectral: Through the optional ``n_fft`` parameter, the model can
	apply an STFT transformation, converting time-domain signals into a
	spectrogram representation before encoding.

	.. rubric:: Additional Mechanisms

	- Clamping and scaling: The model relies on clamping input values
	(e.g., at 20 standard deviations) to prevent outliers and large
	electromagnetic artifacts from destabilizing the BatchNorm estimates and
	optimization process.
	- Scaled subject embeddings: When ``subject_dim`` is used, the
	:class:`~braindecode.models.brainmodule._ScaledEmbedding` layer scales up
	the learning rate for subject-specific features to prevent slow
	convergence in multi-participant training.


	- _ConvSequence and residual logic: This class handles the actual
	stacking of layers. It is designed to be flexible with the ``growth``
	parameter; if the channel size changes between layers (``growth != 1.0``),
	it automatically applies a 1x1 ``skip_projection`` convolution to the
	residual path so dimensions match for addition.
	- _ChannelDropout: Unlike standard dropout which zeroes individual
	neurons, this zeroes entire channels. It includes a rescale feature that
	multiplies the remaining channels by a factor
	``total_channels / active_channels`` to maintain the expected value of the
	signal during training.
	- _ScaledEmbedding: This is a clever optimization for multi-subject
	learning. By dividing the initial weights by a scale and then multiplying
	the output by the same scale, it effectively increases the gradient
	magnitude for the embedding weights, allowing subject-specific features to
	learn faster than the shared backbone.


	Parameters
	----------
	hidden_dim : int, default=320
	Hidden dimension for convolutional layers. Input is projected to this
	dimension before the convolutional blocks.
	depth : int, default=10
	Number of convolutional blocks. Each block contains a dilated convolution
	with batch normalization and activation, followed by a residual connection.
	kernel_size : int, default=3
	Convolutional kernel size. Must be odd for proper padding with dilation.
	growth : float, default=1.0
	Channel size multiplier: hidden_dim * (growth ** layer_index).
	Values > 1.0 grow channels deeper; < 1.0 shrink them.
	Note: growth != 1.0 disables residual connections between layers
	with different channel sizes.
	dilation_growth : int, default=2
	Dilation multiplier per layer (e.g., 2 means dilation doubles each layer).
	Improves receptive field exponentially. Requires odd kernel_size.
	dilation_period : int, default=5
	Reset dilation to 1 every N layers. Prevents dilation from growing
	too large and maintains local connectivity.
	conv_drop_prob : float, default=0.0
	Dropout probability for convolutional layers.
	dropout_input : float, default=0.0
	Dropout probability applied to model input only.
	batch_norm : bool, default=True
	If True, apply batch normalization after each convolution.
	activation : type[nn.Module], default=nn.GELU
	Activation function class to use (e.g., nn.GELU, nn.ReLU, nn.ELU).
	n_subjects : int, default=200
	Number of unique subjects (for subject-specific pathways).
	Only used if subject_dim > 0.
	subject_dim : int, default=0
	Dimension of subject embeddings. If 0, no subject-specific features.
	If > 0, adds subject embeddings to the input before encoding.
	subject_layers : bool, default=False
	If True, apply subject-specific linear transformations to input channels.
	Each subject has its own weight matrix. Requires subject_dim > 0.
	subject_layers_dim : str, default="input"
	Where to apply subject layers: "input" or "hidden".
	subject_layers_id : bool, default=False
	If True, initialize subject layers as identity matrices.
	embedding_scale : float, default=1.0
	Scaling factor for subject embeddings learning rate.
	n_fft : int, optional
	FFT size for STFT processing. If None, no STFT is applied.
	If specified, applies spectrogram transform before encoding.
	fft_complex : bool, default=True
	If True, keep complex spectrogram. If False, use power spectrogram.
	Only used when n_fft is not None.
	channel_dropout_prob : float, default=0.0
	Probability of dropping each channel during training (0.0 to 1.0).
	If 0.0, no channel dropout is applied.
	channel_dropout_type : str, optional
	If specified with chs_info, only drop channels of this type
	(e.g., 'eeg', 'ref', 'eog'). If None with dropout_prob > 0, drops any channel.
	glu : int, default=2
	If > 0, applies Gated Linear Units (GLU) every N convolutional layers.
	GLUs gate intermediate representations for more expressivity.
	If 0, no GLU is applied.
	glu_context : int, default=1
	Context window size for GLU gates. If > 0, uses contextual information
	from neighboring time steps for gating. Requires glu > 0.

	References
	----------
	.. [brainmagick] Défossez, A., Caucheteux, C., Rapin, J., Kabeli, O., & King, J. R.
	(2023). Decoding speech perception from non-invasive brain recordings. Nature
	Machine Intelligence, 5(10), 1097-1107.

	Notes
	-----
	- Input shape: (batch, n_chans, n_times)
	- Output shape: (batch, n_outputs)
	- The model uses dilated convolutions with stride=1 to maintain temporal
	resolution while achieving large receptive fields.
	- Residual connections are applied at every layer where input and output
	channels match.
	- Subject-specific features (subject_dim > 0, subject_layers) require passing
	subject indices in the forward pass as an optional parameter or via batch.
	- STFT processing (n_fft > 0) automatically transforms input to spectrogram domain.

	.. versionadded:: 1.2

	.. rubric:: Hugging Face Hub integration

	When the optional ``huggingface_hub`` package is installed, all models
	automatically gain the ability to be pushed to and loaded from the
	Hugging Face Hub. Install with::

	pip install braindecode[hub]

	Pushing a model to the Hub:

	.. code::
	from braindecode.models import BrainModule

	# Train your model
	model = BrainModule(n_chans=22, n_outputs=4, n_times=1000)
	# ... training code ...

	# Push to the Hub
	model.push_to_hub(
	repo_id="username/my-brainmodule-model",
	commit_message="Initial model upload",
	)

	Loading a model from the Hub:

	.. code::
	from braindecode.models import BrainModule

	# Load pretrained model
	model = BrainModule.from_pretrained("username/my-brainmodule-model")

	# Load with a different number of outputs (head is rebuilt automatically)
	model = BrainModule.from_pretrained("username/my-brainmodule-model", n_outputs=4)

	Extracting features and replacing the head:

	.. code::
	import torch

	x = torch.randn(1, model.n_chans, model.n_times)
	# Extract encoder features (consistent dict across all models)
	out = model(x, return_features=True)
	features = out["features"]

	# Replace the classification head
	model.reset_head(n_outputs=10)

	Saving and restoring full configuration:

	.. code::
	import json

	config = model.get_config() # all __init__ params
	with open("config.json", "w") as f:
	json.dump(config, f)

	model2 = BrainModule.from_config(config) # reconstruct (no weights)

	All model parameters (both EEG-specific and model-specific such as
	dropout rates, activation functions, number of filters) are automatically
	saved to the Hub and restored when loading.

	See :ref:`load-pretrained-models` for a complete tutorial.</main>
	</div>

	## Citation

	Please cite both the original paper for this architecture (see the
	References section above) and braindecode:

	```bibtex
	@article{aristimunha2025braindecode,
	title = {Braindecode: a deep learning library for raw electrophysiological data},
	author = {Aristimunha, Bruno and others},
	journal = {Zenodo},
	year = {2025},
	doi = {10.5281/zenodo.17699192},
	}
	```

	## License

	BSD-3-Clause for the model code (matching braindecode).
	Pretraining-derived weights, if you fine-tune from a checkpoint,
	inherit the licence of that checkpoint and its training corpus.