BrainModule / README.md
bruAristimunha's picture
Add architecture-only model card
2e2923c verified
|
raw
history blame
14.9 kB
---
license: bsd-3-clause
library_name: braindecode
pipeline_tag: feature-extraction
tags:
- eeg
- biosignal
- pytorch
- neuroscience
- braindecode
- convolutional
---
# BrainModule
BrainModule from , also known as SimpleConv.
> **Architecture-only repository.** This repo documents the
> `braindecode.models.BrainModule` class. **No pretrained weights are
> distributed here** — instantiate the model and train it on your own
> data, or fine-tune from a published foundation-model checkpoint
> separately.
## Quick start
```bash
pip install braindecode
```
```python
from braindecode.models import BrainModule
model = BrainModule(
n_chans=22,
sfreq=250,
input_window_seconds=4.0,
n_outputs=4,
)
```
The signal-shape arguments above are example defaults — adjust them
to match your recording.
## Documentation
- Full API reference (parameters, references, architecture figure):
<https://braindecode.org/stable/generated/braindecode.models.BrainModule.html>
- Interactive browser with live instantiation:
<https://huggingface.co/spaces/braindecode/model-explorer>
- Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/brainmodule.py#L25>
## Architecture description
The block below is the rendered class docstring (parameters,
references, architecture figure where available).
<div class='bd-doc'><main>
<p>BrainModule from [brainmagick]_, also known as SimpleConv.</p>
<blockquote>
<p>A dilated convolutional encoder for EEG decoding, using residual
connections and optional GLU gating for improved expressivity.</p>
</blockquote>
<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#5cb85c;color:white;font-size:11px;font-weight:600;margin-right:4px;">Convolution</span>
.. figure:: ../_static/model/simpleconv.png
:align: center
:alt: BrainModule Architecture
:width: 500px
Figure adapted Extended Data Fig. 4 from [brainmagick]_ to highlight only the model part.
Architecture of the brain module. Architecture used to process the brain recordings.
For each layer, the authors note first the number of output channels, while the number of time steps
is constant throughout the layers. The model is composed of a spatial attention layer,
then a 1x1 convolution without activation. A 'Subject Layer' is selected based on the subject index s,
which consists in a 1x1 convolution learnt only for that subject with no activation. Then,
the authors apply five convolutional blocks made of three convolutions. The first
two use residual skip connection and increasing dilation, followed by a BatchNorm layer and a
GELU activation. The third convolution is not residual, and uses a GLU activation
(which halves the number of channels) and no normalization.
Finally, the authors apply two 1x1 convolutions with a GELU in between.
The BrainModule (also referred to as SimpleConv) is a deep dilated
convolutional encoder specifically designed to decode perceived speech from
non-invasive brain recordings like EEG and MEG. It is engineered to address
the high noise levels and inter-individual variability inherent in
non-invasive neuroimaging by using a single architecture trained across
large cohorts while accommodating participant-specific differences.
.. rubric:: Architecture Overview
The BrainModule integrates three primary mechanisms to align brain activity
with deep speech representations:
1. **Spatial-temporal feature extraction.** The model uses a dedicated
spatial attention layer to remap sensor data based on physical
locations, followed by temporal processing through dilated convolutions.
2. **Subject-specific adaptation.** To leverage inter-subject variability,
the architecture includes a "Subject Layer" or participant-specific
1x1 convolution that allows the model to share core weights across a
cohort while learning individual-specific neural patterns.
3. **Dilated residual blocks with gating.** The core encoder employs a
stack of convolutional blocks featuring skip connections and increasing
dilation to expand the receptive field without losing temporal
resolution, supplemented by optional Gated Linear Units (GLU) for
increased expressivity.
.. rubric:: Macro Components
``BrainModule.input_projection`` (Initial Processing)
**Operations.** Raw M/EEG input
:math:`\mathbf{X} \in \mathbb{R}^{C \times T}` is first processed
through a spatial attention layer that projects sensor locations onto a
2D plane using Fourier-parameterized functions. This is followed by a
subject-specific 1x1 convolution
:math:`\mathbf{M}_s \in \mathbb{R}^{D_1 \times D_1}` if subject
features are enabled. The resulting features are projected to the
``hidden_dim`` (default 320) to ensure compatibility with subsequent
residual connections.
**Role.** Converts high-dimensional, subject-dependent sensor data into
a standardized latent space while preserving spatial and temporal
relationships.
``BrainModule.encoder`` (Convolutional Sequence)
**Operations.** Implemented via
:class:`~braindecode.models.brainmodule._ConvSequence`, this component
consists of a stack of ``k`` convolutional blocks. Each block typically
contains: (a) **Residual dilated convolutions.** Two layers with kernel
size 3, residual skip connections, and dilation factors that grow
exponentially (e.g., powers of two with periodic resets) to capture
multi-scale temporal context. (b) **GLU gating.** Every ``N`` layers
(defined by ``glu``), a Gated Linear Unit is applied, which halves the
channel dimension and introduces non-linear gating to filter
intermediate representations.
**Role.** Extracts deep hierarchical temporal features from the brain
signal, significantly expanding the model's receptive field to align
with the contextual windows of speech modules like wav2vec 2.0.
.. rubric:: Temporal, Spatial, and Spectral Encoding
- **Temporal:** Increasing dilation factors across layers allow the model to
integrate information over large time windows without the computational
cost of standard large kernels, while a 150 ms input shift facilitates
alignment between stimulus and brain response.
- **Spatial:** The spatial attention layer learns a softmax weighting over
input sensors based on their 3D coordinates, allowing the model to focus
on regions typically activated during auditory stimulation (e.g., the
temporal cortex).
- **Spectral:** Through the optional ``n_fft`` parameter, the model can
apply an STFT transformation, converting time-domain signals into a
spectrogram representation before encoding.
.. rubric:: Additional Mechanisms
- **Clamping and scaling:** The model relies on clamping input values
(e.g., at 20 standard deviations) to prevent outliers and large
electromagnetic artifacts from destabilizing the BatchNorm estimates and
optimization process.
- **Scaled subject embeddings:** When ``subject_dim`` is used, the
:class:`~braindecode.models.brainmodule._ScaledEmbedding` layer scales up
the learning rate for subject-specific features to prevent slow
convergence in multi-participant training.
- **_ConvSequence and residual logic:** This class handles the actual
stacking of layers. It is designed to be flexible with the ``growth``
parameter; if the channel size changes between layers (``growth != 1.0``),
it automatically applies a 1x1 ``skip_projection`` convolution to the
residual path so dimensions match for addition.
- **_ChannelDropout:** Unlike standard dropout which zeroes individual
neurons, this zeroes entire channels. It includes a rescale feature that
multiplies the remaining channels by a factor
``total_channels / active_channels`` to maintain the expected value of the
signal during training.
- **_ScaledEmbedding:** This is a clever optimization for multi-subject
learning. By dividing the initial weights by a scale and then multiplying
the output by the same scale, it effectively increases the gradient
magnitude for the embedding weights, allowing subject-specific features to
learn faster than the shared backbone.
Parameters
----------
hidden_dim : int, default=320
Hidden dimension for convolutional layers. Input is projected to this
dimension before the convolutional blocks.
depth : int, default=10
Number of convolutional blocks. Each block contains a dilated convolution
with batch normalization and activation, followed by a residual connection.
kernel_size : int, default=3
Convolutional kernel size. Must be odd for proper padding with dilation.
growth : float, default=1.0
Channel size multiplier: hidden_dim * (growth ** layer_index).
Values > 1.0 grow channels deeper; < 1.0 shrink them.
Note: growth != 1.0 disables residual connections between layers
with different channel sizes.
dilation_growth : int, default=2
Dilation multiplier per layer (e.g., 2 means dilation doubles each layer).
Improves receptive field exponentially. Requires odd kernel_size.
dilation_period : int, default=5
Reset dilation to 1 every N layers. Prevents dilation from growing
too large and maintains local connectivity.
conv_drop_prob : float, default=0.0
Dropout probability for convolutional layers.
dropout_input : float, default=0.0
Dropout probability applied to model input only.
batch_norm : bool, default=True
If True, apply batch normalization after each convolution.
activation : type[nn.Module], default=nn.GELU
Activation function class to use (e.g., nn.GELU, nn.ReLU, nn.ELU).
n_subjects : int, default=200
Number of unique subjects (for subject-specific pathways).
Only used if subject_dim > 0.
subject_dim : int, default=0
Dimension of subject embeddings. If 0, no subject-specific features.
If > 0, adds subject embeddings to the input before encoding.
subject_layers : bool, default=False
If True, apply subject-specific linear transformations to input channels.
Each subject has its own weight matrix. Requires subject_dim > 0.
subject_layers_dim : str, default="input"
Where to apply subject layers: "input" or "hidden".
subject_layers_id : bool, default=False
If True, initialize subject layers as identity matrices.
embedding_scale : float, default=1.0
Scaling factor for subject embeddings learning rate.
n_fft : int, optional
FFT size for STFT processing. If None, no STFT is applied.
If specified, applies spectrogram transform before encoding.
fft_complex : bool, default=True
If True, keep complex spectrogram. If False, use power spectrogram.
Only used when n_fft is not None.
channel_dropout_prob : float, default=0.0
Probability of dropping each channel during training (0.0 to 1.0).
If 0.0, no channel dropout is applied.
channel_dropout_type : str, optional
If specified with chs_info, only drop channels of this type
(e.g., 'eeg', 'ref', 'eog'). If None with dropout_prob > 0, drops any channel.
glu : int, default=2
If > 0, applies Gated Linear Units (GLU) every N convolutional layers.
GLUs gate intermediate representations for more expressivity.
If 0, no GLU is applied.
glu_context : int, default=1
Context window size for GLU gates. If > 0, uses contextual information
from neighboring time steps for gating. Requires glu > 0.
References
----------
.. [brainmagick] Défossez, A., Caucheteux, C., Rapin, J., Kabeli, O., & King, J. R.
(2023). Decoding speech perception from non-invasive brain recordings. Nature
Machine Intelligence, 5(10), 1097-1107.
Notes
-----
- Input shape: (batch, n_chans, n_times)
- Output shape: (batch, n_outputs)
- The model uses dilated convolutions with stride=1 to maintain temporal
resolution while achieving large receptive fields.
- Residual connections are applied at every layer where input and output
channels match.
- Subject-specific features (subject_dim > 0, subject_layers) require passing
subject indices in the forward pass as an optional parameter or via batch.
- STFT processing (n_fft > 0) automatically transforms input to spectrogram domain.
.. versionadded:: 1.2
.. rubric:: Hugging Face Hub integration
When the optional ``huggingface_hub`` package is installed, all models
automatically gain the ability to be pushed to and loaded from the
Hugging Face Hub. Install with::
pip install braindecode[hub]
**Pushing a model to the Hub:**
.. code::
from braindecode.models import BrainModule
# Train your model
model = BrainModule(n_chans=22, n_outputs=4, n_times=1000)
# ... training code ...
# Push to the Hub
model.push_to_hub(
repo_id="username/my-brainmodule-model",
commit_message="Initial model upload",
)
**Loading a model from the Hub:**
.. code::
from braindecode.models import BrainModule
# Load pretrained model
model = BrainModule.from_pretrained("username/my-brainmodule-model")
# Load with a different number of outputs (head is rebuilt automatically)
model = BrainModule.from_pretrained("username/my-brainmodule-model", n_outputs=4)
**Extracting features and replacing the head:**
.. code::
import torch
x = torch.randn(1, model.n_chans, model.n_times)
# Extract encoder features (consistent dict across all models)
out = model(x, return_features=True)
features = out["features"]
# Replace the classification head
model.reset_head(n_outputs=10)
**Saving and restoring full configuration:**
.. code::
import json
config = model.get_config() # all __init__ params
with open("config.json", "w") as f:
json.dump(config, f)
model2 = BrainModule.from_config(config) # reconstruct (no weights)
All model parameters (both EEG-specific and model-specific such as
dropout rates, activation functions, number of filters) are automatically
saved to the Hub and restored when loading.
See :ref:`load-pretrained-models` for a complete tutorial.</main>
</div>
## Citation
Please cite both the original paper for this architecture (see the
*References* section above) and braindecode:
```bibtex
@article{aristimunha2025braindecode,
title = {Braindecode: a deep learning library for raw electrophysiological data},
author = {Aristimunha, Bruno and others},
journal = {Zenodo},
year = {2025},
doi = {10.5281/zenodo.17699192},
}
```
## License
BSD-3-Clause for the model code (matching braindecode).
Pretraining-derived weights, if you fine-tune from a checkpoint,
inherit the licence of that checkpoint and its training corpus.