File size: 14,898 Bytes

2e2923c

---
license: bsd-3-clause
library_name: braindecode
pipeline_tag: feature-extraction
tags:
  - eeg
  - biosignal
  - pytorch
  - neuroscience
  - braindecode
  - convolutional
---

# BrainModule

BrainModule from , also known as SimpleConv.

> **Architecture-only repository.** This repo documents the
> `braindecode.models.BrainModule` class. **No pretrained weights are
> distributed here** — instantiate the model and train it on your own
> data, or fine-tune from a published foundation-model checkpoint
> separately.

## Quick start

```bash
pip install braindecode
```

```python
from braindecode.models import BrainModule

model = BrainModule(
    n_chans=22,
    sfreq=250,
    input_window_seconds=4.0,
    n_outputs=4,
)
```

The signal-shape arguments above are example defaults — adjust them
to match your recording.

## Documentation

- Full API reference (parameters, references, architecture figure):
  <https://braindecode.org/stable/generated/braindecode.models.BrainModule.html>
- Interactive browser with live instantiation:
  <https://huggingface.co/spaces/braindecode/model-explorer>
- Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/brainmodule.py#L25>

## Architecture description

The block below is the rendered class docstring (parameters,
references, architecture figure where available).

<div class='bd-doc'><main>
<p>BrainModule from [brainmagick]_, also known as SimpleConv.</p>
<blockquote>
<p>A dilated convolutional encoder for EEG decoding, using residual
connections and optional GLU gating for improved expressivity.</p>
</blockquote>
<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#5cb85c;color:white;font-size:11px;font-weight:600;margin-right:4px;">Convolution</span>



 .. figure:: ../_static/model/simpleconv.png
     :align: center
     :alt: BrainModule Architecture
     :width: 500px

     Figure adapted Extended Data Fig. 4 from [brainmagick]_ to highlight only the model part.
     Architecture of the brain module. Architecture used to process the brain recordings.
     For each layer, the authors note first the number of output channels, while the number of time steps
     is constant throughout the layers. The model is composed of a spatial attention layer,
     then a 1x1 convolution without activation. A 'Subject Layer' is selected based on the subject index s,
     which consists in a 1x1 convolution learnt only for that subject with no activation. Then,
     the authors apply five convolutional blocks made of three convolutions. The first
     two use residual skip connection and increasing dilation, followed by a BatchNorm layer and a
     GELU activation. The third convolution is not residual, and uses a GLU activation
     (which halves the number of channels) and no normalization.
     Finally, the authors apply two 1x1 convolutions with a GELU in between.

 The BrainModule (also referred to as SimpleConv) is a deep dilated
 convolutional encoder specifically designed to decode perceived speech from
 non-invasive brain recordings like EEG and MEG. It is engineered to address
 the high noise levels and inter-individual variability inherent in
 non-invasive neuroimaging by using a single architecture trained across
 large cohorts while accommodating participant-specific differences.

 .. rubric:: Architecture Overview

 The BrainModule integrates three primary mechanisms to align brain activity
 with deep speech representations:

 1. **Spatial-temporal feature extraction.** The model uses a dedicated
    spatial attention layer to remap sensor data based on physical
    locations, followed by temporal processing through dilated convolutions.
 2. **Subject-specific adaptation.** To leverage inter-subject variability,
    the architecture includes a "Subject Layer" or participant-specific
    1x1 convolution that allows the model to share core weights across a
    cohort while learning individual-specific neural patterns.
 3. **Dilated residual blocks with gating.** The core encoder employs a
    stack of convolutional blocks featuring skip connections and increasing
    dilation to expand the receptive field without losing temporal
    resolution, supplemented by optional Gated Linear Units (GLU) for
    increased expressivity.

 .. rubric:: Macro Components

 ``BrainModule.input_projection`` (Initial Processing)
     **Operations.** Raw M/EEG input
     :math:`\mathbf{X} \in \mathbb{R}^{C \times T}` is first processed
     through a spatial attention layer that projects sensor locations onto a
     2D plane using Fourier-parameterized functions. This is followed by a
     subject-specific 1x1 convolution
     :math:`\mathbf{M}_s \in \mathbb{R}^{D_1 \times D_1}` if subject
     features are enabled. The resulting features are projected to the
     ``hidden_dim`` (default 320) to ensure compatibility with subsequent
     residual connections.

     **Role.** Converts high-dimensional, subject-dependent sensor data into
     a standardized latent space while preserving spatial and temporal
     relationships.

 ``BrainModule.encoder`` (Convolutional Sequence)
     **Operations.** Implemented via
     :class:`~braindecode.models.brainmodule._ConvSequence`, this component
     consists of a stack of ``k`` convolutional blocks. Each block typically
     contains: (a) **Residual dilated convolutions.** Two layers with kernel
     size 3, residual skip connections, and dilation factors that grow
     exponentially (e.g., powers of two with periodic resets) to capture
     multi-scale temporal context. (b) **GLU gating.** Every ``N`` layers
     (defined by ``glu``), a Gated Linear Unit is applied, which halves the
     channel dimension and introduces non-linear gating to filter
     intermediate representations.

     **Role.** Extracts deep hierarchical temporal features from the brain
     signal, significantly expanding the model's receptive field to align
     with the contextual windows of speech modules like wav2vec 2.0.

 .. rubric:: Temporal, Spatial, and Spectral Encoding

 - **Temporal:** Increasing dilation factors across layers allow the model to
   integrate information over large time windows without the computational
   cost of standard large kernels, while a 150 ms input shift facilitates
   alignment between stimulus and brain response.
 - **Spatial:** The spatial attention layer learns a softmax weighting over
   input sensors based on their 3D coordinates, allowing the model to focus
   on regions typically activated during auditory stimulation (e.g., the
   temporal cortex).
 - **Spectral:** Through the optional ``n_fft`` parameter, the model can
   apply an STFT transformation, converting time-domain signals into a
   spectrogram representation before encoding.

 .. rubric:: Additional Mechanisms

 - **Clamping and scaling:** The model relies on clamping input values
   (e.g., at 20 standard deviations) to prevent outliers and large
   electromagnetic artifacts from destabilizing the BatchNorm estimates and
   optimization process.
 - **Scaled subject embeddings:** When ``subject_dim`` is used, the
   :class:`~braindecode.models.brainmodule._ScaledEmbedding` layer scales up
   the learning rate for subject-specific features to prevent slow
   convergence in multi-participant training.


 - **_ConvSequence and residual logic:** This class handles the actual
   stacking of layers. It is designed to be flexible with the ``growth``
   parameter; if the channel size changes between layers (``growth != 1.0``),
   it automatically applies a 1x1 ``skip_projection`` convolution to the
   residual path so dimensions match for addition.
 - **_ChannelDropout:** Unlike standard dropout which zeroes individual
   neurons, this zeroes entire channels. It includes a rescale feature that
   multiplies the remaining channels by a factor
   ``total_channels / active_channels`` to maintain the expected value of the
   signal during training.
 - **_ScaledEmbedding:** This is a clever optimization for multi-subject
   learning. By dividing the initial weights by a scale and then multiplying
   the output by the same scale, it effectively increases the gradient
   magnitude for the embedding weights, allowing subject-specific features to
   learn faster than the shared backbone.


 Parameters
 ----------
 hidden_dim : int, default=320
     Hidden dimension for convolutional layers. Input is projected to this
     dimension before the convolutional blocks.
 depth : int, default=10
     Number of convolutional blocks. Each block contains a dilated convolution
     with batch normalization and activation, followed by a residual connection.
 kernel_size : int, default=3
     Convolutional kernel size. Must be odd for proper padding with dilation.
 growth : float, default=1.0
     Channel size multiplier: hidden_dim * (growth ** layer_index).
     Values > 1.0 grow channels deeper; < 1.0 shrink them.
     Note: growth != 1.0 disables residual connections between layers
     with different channel sizes.
 dilation_growth : int, default=2
     Dilation multiplier per layer (e.g., 2 means dilation doubles each layer).
     Improves receptive field exponentially. Requires odd kernel_size.
 dilation_period : int, default=5
     Reset dilation to 1 every N layers. Prevents dilation from growing
     too large and maintains local connectivity.
 conv_drop_prob : float, default=0.0
     Dropout probability for convolutional layers.
 dropout_input : float, default=0.0
     Dropout probability applied to model input only.
 batch_norm : bool, default=True
     If True, apply batch normalization after each convolution.
 activation : type[nn.Module], default=nn.GELU
     Activation function class to use (e.g., nn.GELU, nn.ReLU, nn.ELU).
 n_subjects : int, default=200
     Number of unique subjects (for subject-specific pathways).
     Only used if subject_dim > 0.
 subject_dim : int, default=0
     Dimension of subject embeddings. If 0, no subject-specific features.
     If > 0, adds subject embeddings to the input before encoding.
 subject_layers : bool, default=False
     If True, apply subject-specific linear transformations to input channels.
     Each subject has its own weight matrix. Requires subject_dim > 0.
 subject_layers_dim : str, default="input"
     Where to apply subject layers: "input" or "hidden".
 subject_layers_id : bool, default=False
     If True, initialize subject layers as identity matrices.
 embedding_scale : float, default=1.0
     Scaling factor for subject embeddings learning rate.
 n_fft : int, optional
     FFT size for STFT processing. If None, no STFT is applied.
     If specified, applies spectrogram transform before encoding.
 fft_complex : bool, default=True
     If True, keep complex spectrogram. If False, use power spectrogram.
     Only used when n_fft is not None.
 channel_dropout_prob : float, default=0.0
     Probability of dropping each channel during training (0.0 to 1.0).
     If 0.0, no channel dropout is applied.
 channel_dropout_type : str, optional
     If specified with chs_info, only drop channels of this type
     (e.g., 'eeg', 'ref', 'eog'). If None with dropout_prob > 0, drops any channel.
 glu : int, default=2
     If > 0, applies Gated Linear Units (GLU) every N convolutional layers.
     GLUs gate intermediate representations for more expressivity.
     If 0, no GLU is applied.
 glu_context : int, default=1
     Context window size for GLU gates. If > 0, uses contextual information
     from neighboring time steps for gating. Requires glu > 0.

 References
 ----------
 .. [brainmagick] Défossez, A., Caucheteux, C., Rapin, J., Kabeli, O., & King, J. R.
    (2023). Decoding speech perception from non-invasive brain recordings. Nature
    Machine Intelligence, 5(10), 1097-1107.

 Notes
 -----
 - Input shape: (batch, n_chans, n_times)
 - Output shape: (batch, n_outputs)
 - The model uses dilated convolutions with stride=1 to maintain temporal
   resolution while achieving large receptive fields.
 - Residual connections are applied at every layer where input and output
   channels match.
 - Subject-specific features (subject_dim > 0, subject_layers) require passing
   subject indices in the forward pass as an optional parameter or via batch.
 - STFT processing (n_fft > 0) automatically transforms input to spectrogram domain.

 .. versionadded:: 1.2

 .. rubric:: Hugging Face Hub integration

 When the optional ``huggingface_hub`` package is installed, all models
 automatically gain the ability to be pushed to and loaded from the
 Hugging Face Hub. Install with::

     pip install braindecode[hub]

 **Pushing a model to the Hub:**

 .. code::
     from braindecode.models import BrainModule

     # Train your model
     model = BrainModule(n_chans=22, n_outputs=4, n_times=1000)
     # ... training code ...

     # Push to the Hub
     model.push_to_hub(
         repo_id="username/my-brainmodule-model",
         commit_message="Initial model upload",
     )

 **Loading a model from the Hub:**

 .. code::
     from braindecode.models import BrainModule

     # Load pretrained model
     model = BrainModule.from_pretrained("username/my-brainmodule-model")

     # Load with a different number of outputs (head is rebuilt automatically)
     model = BrainModule.from_pretrained("username/my-brainmodule-model", n_outputs=4)

 **Extracting features and replacing the head:**

 .. code::
     import torch

     x = torch.randn(1, model.n_chans, model.n_times)
     # Extract encoder features (consistent dict across all models)
     out = model(x, return_features=True)
     features = out["features"]

     # Replace the classification head
     model.reset_head(n_outputs=10)

 **Saving and restoring full configuration:**

 .. code::
     import json

     config = model.get_config()            # all __init__ params
     with open("config.json", "w") as f:
         json.dump(config, f)

     model2 = BrainModule.from_config(config)    # reconstruct (no weights)

 All model parameters (both EEG-specific and model-specific such as
 dropout rates, activation functions, number of filters) are automatically
 saved to the Hub and restored when loading.

 See :ref:`load-pretrained-models` for a complete tutorial.</main>
</div>

## Citation

Please cite both the original paper for this architecture (see the
*References* section above) and braindecode:

```bibtex
@article{aristimunha2025braindecode,
  title   = {Braindecode: a deep learning library for raw electrophysiological data},
  author  = {Aristimunha, Bruno and others},
  journal = {Zenodo},
  year    = {2025},
  doi     = {10.5281/zenodo.17699192},
}
```

## License

BSD-3-Clause for the model code (matching braindecode).
Pretraining-derived weights, if you fine-tune from a checkpoint,
inherit the licence of that checkpoint and its training corpus.