braindecode
/

REVE

+---
+license: bsd-3-clause
+library_name: braindecode
+pipeline_tag: feature-extraction
+tags:
+  - eeg
+  - biosignal
+  - pytorch
+  - neuroscience
+  - braindecode
+  - foundation-model
+  - transformer
+---
+# REVE
+**R**\ epresentation for **E**\ EG with **V**\ ersatile **E**\ mbeddings (REVE) from El Ouahidi et al. (2025) .
+> **Architecture-only repository.** This repo documents the
+> `braindecode.models.REVE` class. **No pretrained weights are
+> distributed here** — instantiate the model and train it on your own
+> data, or fine-tune from a published foundation-model checkpoint
+> separately.
+## Quick start
+```bash
+pip install braindecode
+```
+```python
+from braindecode.models import REVE
+model = REVE(
+    n_chans=22,
+    sfreq=250,
+    input_window_seconds=4.0,
+    n_outputs=4,
+)
+```
+The signal-shape arguments above are example defaults — adjust them
+to match your recording.
+## Documentation
+- Full API reference (parameters, references, architecture figure):
+  <https://braindecode.org/stable/generated/braindecode.models.REVE.html>
+- Interactive browser with live instantiation:
+  <https://huggingface.co/spaces/braindecode/model-explorer>
+- Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/reve.py#L35>
+## Architecture description
+The block below is the rendered class docstring (parameters,
+references, architecture figure where available).
+<div class='bd-doc'><main>
+<p><strong>R</strong>epresentation for <strong>E</strong>EG with <strong>V</strong>ersatile <strong>E</strong>mbeddings (REVE) from El Ouahidi et al. (2025) <a class="citation-reference" href="#reve" id="citation-reference-1" role="doc-biblioref">[reve]</a>.</p>
+<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#d9534f;color:white;font-size:11px;font-weight:600;margin-right:4px;">Foundation Model</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#56B4E9;color:white;font-size:11px;font-weight:600;margin-right:4px;">Attention/Transformer</span><figure class="align-center">
+<img alt="REVE Training pipeline overview" src="https://brain-bzh.github.io/reve/static/images/architecture.png" style="width: 1000px;" />
+</figure>
+<p>Foundation models have transformed machine learning by reducing reliance on
+task-specific data and induced biases through large-scale pretraining. While
+successful in language and vision, their adoption in EEG has lagged due to the
+heterogeneity of public datasets, which are collected under varying protocols,
+devices, and electrode configurations. Existing EEG foundation models struggle
+to generalize across these variations, often restricting pretraining to a single
+setup and resulting in suboptimal performance, particularly under linear probing.</p>
+<p>REVE is a pretrained model explicitly designed to generalize across diverse EEG signals. It introduces
+a <strong>4D positional encoding</strong> scheme that enables processing signals of arbitrary length and electrode
+arrangement. Using a masked autoencoding objective, REVE was pretrained on over <strong>60,000 hours</strong> of EEG
+data from <strong>92 datasets</strong> spanning <strong>25,000 subjects</strong>, the largest EEG pretraining effort to date.</p>
+<p><strong>Channels Invariant Positional Encoding</strong></p>
+<p>Prior EEG foundation models (:class:`~braindecode.models.Labram`, :class:`~braindecode.models.BIOT`) rely on
+fixed positional embeddings, making direct transfer to unseen electrode layouts infeasible. CBraMod uses
+convolution-based positional encoding that requires fine-tuning when adapting to new configurations.
+As noted in the CBraMod paper: <em>&quot;fixing the pre-trained parameters during training on downstream
+datasets will lead to a very large performance decline.&quot;</em></p>
+<p>REVE's 4D positional encoding jointly encodes spatial <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mo stretchy="false">(</mo>
+  <mi>x</mi>
+  <mo>,</mo>
+  <mi>y</mi>
+  <mo>,</mo>
+  <mi>z</mi>
+  <mo stretchy="false">)</mo>
+</math> and temporal <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mo stretchy="false">(</mo>
+  <mi>t</mi>
+  <mo stretchy="false">)</mo>
+</math> positions
+using Fourier embeddings, enabling true cross-configuration transfer without retraining. The fourier embedding
+have inspiration on brainmodule <a class="citation-reference" href="#brainmodule" id="citation-reference-2" role="doc-biblioref">[brainmodule]</a>, generalized to 4D for EEG with the channel spatial coordinates
+and temporal patch index.</p>
+<p><strong>Linear Probing Performance</strong></p>
+<p>A key advantage of REVE is producing useful latent representation without heavy fine-tuning. Under linear
+probing (frozen encoder), REVE achieves state-of-the-art results on downstream EEG tasks.
+This enables practical deployment in low-data scenarios where extensive fine-tuning is not feasible.</p>
+<p><strong>Architecture</strong></p>
+<p>The model adopts modern Transformer components validated through ablation studies:</p>
+<ul class="simple">
+<li><p><strong>Normalization</strong>: RMSNorm outperforms LayerNorm;</p></li>
+<li><p><strong>Activation</strong>: GEGLU outperforms GELU;</p></li>
+<li><p><strong>Attention</strong>: Flash Attention via PyTorch's SDPA;</p></li>
+<li><p><strong>Masking ratio</strong>: 55% optimal for spatio-temporal block masking</p></li>
+</ul>
+<p>These choices align with best practices from large language models and were empirically validated
+on EEG data.</p>
+<p><strong>Secondary Loss</strong></p>
+<p>A secondary reconstruction objective using attention pooling across layers prevents over-specialization
+in the final layer. This pooling acts as an information bottleneck, forcing the model to distill key
+information from the entire sequence. Ablations show this loss is crucial for linear probing quality:
+removing it drops average performance in 10% under the frozen evaluation.</p>
+<p><strong>Macro Components</strong></p>
+<ul>
+<li><p><span class="docutils literal">REVE.to_patch_embedding</span> <strong>Patch Tokenization</strong></p>
+<p>The EEG signal is split into overlapping patches along the time dimension, generating
+<math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mi>p</mi>
+  <mo>=</mo>
+  <mrow>
+    <mo>⌈</mo>
+    <mfrac>
+      <mrow>
+        <mi>T</mi>
+        <mo>−</mo>
+        <mi>w</mi>
+      </mrow>
+      <mrow>
+        <mi>w</mi>
+        <mo>−</mo>
+        <mi>o</mi>
+      </mrow>
+    </mfrac>
+    <mo>⌉</mo>
+  </mrow>
+  <mo>+</mo>
+  <mn>𝟏</mn>
+  <mo stretchy="false">[</mo>
+  <mo stretchy="false">(</mo>
+  <mi>T</mi>
+  <mo>−</mo>
+  <mi>w</mi>
+  <mo stretchy="false">)</mo>
+  <mo lspace="0.278em" rspace="0.278em">mod</mo>
+  <mo stretchy="false">(</mo>
+  <mi>w</mi>
+  <mo>−</mo>
+  <mi>o</mi>
+  <mo stretchy="false">)</mo>
+  <mo>≠</mo>
+  <mn>0</mn>
+  <mo stretchy="false">]</mo>
+</math>
+patches of size <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mi>w</mi>
+</math> with overlap <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mi>o</mi>
+</math>, where <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mi>T</mi>
+</math> is the signal length.
+Each patch is linearly projected to the embedding dimension.</p>
+</li>
+<li><p><span class="docutils literal">REVE.fourier4d</span> + <span class="docutils literal">REVE.mlp4d</span> <strong>4D Positional Embedding (4DPE)</strong></p>
+<p>The 4DPE encodes each token's 4D coordinates <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mo stretchy="false">(</mo>
+  <mi>x</mi>
+  <mo>,</mo>
+  <mi>y</mi>
+  <mo>,</mo>
+  <mi>z</mi>
+  <mo>,</mo>
+  <mi>t</mi>
+  <mo stretchy="false">)</mo>
+</math> where <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mo stretchy="false">(</mo>
+  <mi>x</mi>
+  <mo>,</mo>
+  <mi>y</mi>
+  <mo>,</mo>
+  <mi>z</mi>
+  <mo stretchy="false">)</mo>
+</math> are the
+3D spatial coordinates from a standardized electrode position bank, and <math xmlns="http://www.w3.org/1998/Math/MathML">
+  <mi>t</mi>
+</math> is the temporal
+patch index. The encoding combines:</p>
+<ol class="arabic simple">
+<li><p><strong>Fourier embedding</strong>: Sinusoidal encoding across multiple frequencies for smooth interpolation
+to unseen positions</p></li>
+<li><p><strong>MLP embedding</strong>: :class:`~torch.nn.Linear` (4 → embed_dim) → :class:`~torch.nn.GELU` → :class:`~torch.nn.LayerNorm` for learnable refinement</p></li>
+</ol>
+<p>Both components are summed and normalized. The 4DPE adds negligible computational overhead,
+scaling linearly with the number of tokens.</p>
+</li>
+<li><p><span class="docutils literal">REVE.transformer</span> <strong>Transformer Encoder</strong></p>
+<p>Pre-LayerNorm Transformer with multi-head self-attention (:class:`~torch.nn.RMSNorm`), feed-forward networks (GEGLU
+activation), and residual connections. Default configuration: 22 layers, 8 heads, 512 embedding
+dimension (~72M parameters).</p>
+</li>
+<li><p><span class="docutils literal">REVE.final_layer</span> <strong>Classification Head</strong></p>
+<p>Two modes (controlled by the <span class="docutils literal">attention_pooling</span> parameter):</p>
+<ul class="simple">
+<li><p>When <span class="docutils literal">attention_pooling</span> is disabled (e.g., <span class="docutils literal">None</span> or <span class="docutils literal">False</span>): flatten all tokens
+→ :class:`~torch.nn.LayerNorm` → :class:`~torch.nn.Linear`</p></li>
+<li><p>When <span class="docutils literal">attention_pooling</span> is enabled: attention pooling with a learnable query token
+attending to all encoder outputs</p></li>
+</ul>
+</li>
+</ul>
+<p><strong>Known Limitations</strong></p>
+<ul class="simple">
+<li><p><strong>Sparse electrode setups</strong>: Performance degrades with very few channels. On motor imagery,
+accuracy drops from 0.824 (64 channels) to 0.660 (1 channel). For tasks requiring broad
+spatial coverage (e.g., imagined speech), performance with &lt;4 channels approaches chance level.</p></li>
+<li><p><strong>Demographic bias</strong>: The pretraining corpus aggregates publicly available datasets, most
+originating from North America and Europe, resulting in limited demographic diversity,
+more details about the datasets used for pretraining can be found in the REVE paper <a class="citation-reference" href="#reve" id="citation-reference-3" role="doc-biblioref">[reve]</a>.</p></li>
+</ul>
+<p><strong>Pretrained Weights</strong></p>
+<p>Weights are available on <a class="reference external" href="https://huggingface.co/collections/brain-bzh/reve">HuggingFace</a>,
+but you must agree to the data usage terms before downloading:</p>
+<ul class="simple">
+<li><p><span class="docutils literal"><span class="pre">brain-bzh/reve-base</span></span>: 72M parameters, 512 embedding dim, 22 layers (~260 A100 GPU hours)</p></li>
+<li><p><span class="docutils literal"><span class="pre">brain-bzh/reve-large</span></span>: ~400M parameters, 1250 embedding dim</p></li>
+</ul>
+<aside class="admonition important">
+<p class="admonition-title">Important</p>
+<p><strong>Pre-trained Weights Available (Registration Required)</strong></p>
+<p>This model has pre-trained weights available on the Hugging Face Hub.
+<strong>You must first register and agree to the data usage terms on the authors'
+HuggingFace repository before you can access the weights.</strong>
+<a class="reference external" href="https://huggingface.co/collections/brain-bzh/reve">Link here</a>.</p>
+<p>You can load them using:</p>
+<p>To push your own trained model to the Hub:</p>
+<p>Requires installing <span class="docutils literal">braindecode[hug]</span> for Hub integration.</p>
+</aside>
+<p><strong>Usage</strong></p>
+<aside class="admonition warning">
+<p class="admonition-title">Warning</p>
+<p>Input data must be sampled at <strong>200 Hz</strong> to match pretraining. The model applies
+z-score normalization followed by clipping at 15 standard deviations internally
+during pretraining-users should apply similar preprocessing.</p>
+</aside>
+<section id="parameters">
+<h2>Parameters</h2>
+<dl class="simple">
+<dt>embed_dim<span class="classifier">int, default=512</span></dt>
+<dd><p>Embedding dimension. Use 512 for REVE-Base, 1250 for REVE-Large.</p>
+</dd>
+<dt>depth<span class="classifier">int, default=22</span></dt>
+<dd><p>Number of Transformer layers.</p>
+</dd>
+<dt>heads<span class="classifier">int, default=8</span></dt>
+<dd><p>Number of attention heads.</p>
+</dd>
+<dt>head_dim<span class="classifier">int, default=64</span></dt>
+<dd><p>Dimension per attention head.</p>
+</dd>
+<dt>mlp_dim_ratio<span class="classifier">float, default=2.66</span></dt>
+<dd><p>FFN hidden dimension ratio: <span class="docutils literal">mlp_dim = embed_dim × mlp_dim_ratio</span>.</p>
+</dd>
+<dt>use_geglu<span class="classifier">bool, default=True</span></dt>
+<dd><p>Use GEGLU activation (recommended) or standard GELU.</p>
+</dd>
+<dt>freqs<span class="classifier">int, default=4</span></dt>
+<dd><p>Number of frequencies for Fourier positional embedding.</p>
+</dd>
+<dt>patch_size<span class="classifier">int, default=200</span></dt>
+<dd><p>Temporal patch size in samples (200 samples = 1 second at 200 Hz).</p>
+</dd>
+<dt>patch_overlap<span class="classifier">int, default=20</span></dt>
+<dd><p>Overlap between patches in samples.</p>
+</dd>
+<dt>attention_pooling<span class="classifier">bool, default=False</span></dt>
+<dd><p>Pooling strategy for aggregating transformer outputs before classification.
+If <span class="docutils literal">False</span> (default), all tokens are flattened into a single vector of size
+<span class="docutils literal">(n_chans x n_patches x embed_dim)</span>, which is then passed through LayerNorm
+and a linear classifier. If <span class="docutils literal">True</span>, uses attention-based pooling with a
+learnable query token that attends to all encoder outputs, producing a single
+embedding of size <span class="docutils literal">embed_dim</span>. Attention pooling is more parameter-efficient
+for long sequences and variable-length inputs.</p>
+</dd>
+</dl>
+</section>
+<section id="references">
+<h2>References</h2>
+<div role="list" class="citation-list">
+<div class="citation" id="reve" role="doc-biblioentry">
+<span class="label"><span class="fn-bracket">[</span>reve<span class="fn-bracket">]</span></span>
+<span class="backrefs">(<a role="doc-backlink" href="#citation-reference-1">1</a>,<a role="doc-backlink" href="#citation-reference-3">2</a>)</span>
+<p>El Ouahidi, Y., Lys, J., Thölke, P., Farrugia, N., Pasdeloup, B.,
+Gripon, V., Jerbi, K. &amp; Lioi, G. (2025). REVE: A Foundation Model for EEG -
+Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects.
+The Thirty-Ninth Annual Conference on Neural Information Processing Systems.
+<a class="reference external" href="https://openreview.net/forum?id=ZeFMtRBy4Z">https://openreview.net/forum?id=ZeFMtRBy4Z</a></p>
+</div>
+<div class="citation" id="brainmodule" role="doc-biblioentry">
+<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="#citation-reference-2">brainmodule</a><span class="fn-bracket">]</span></span>
+<p>Défossez, A., Caucheteux, C., Rapin, J., Kabeli, O., &amp; King, J. R.
+(2023). Decoding speech perception from non-invasive brain recordings. Nature
+Machine Intelligence, 5(10), 1097-1107.</p>
+</div>
+</div>
+</section>
+<section id="notes">
+<h2>Notes</h2>
+<p>The position bank is downloaded from HuggingFace on first initialization, mapping
+standard 10-20/10-10/10-05 electrode names to 3D coordinates. This enables the
+4D positional encoding to generalize across electrode configurations without
+requiring matched layouts between pretraining and downstream tasks.</p>
+<p><strong>Hugging Face Hub integration</strong></p>
+<p>When the optional <span class="docutils literal">huggingface_hub</span> package is installed, all models
+automatically gain the ability to be pushed to and loaded from the
+Hugging Face Hub. Install with:</p>
+<pre class="literal-block">pip install braindecode[hub]</pre>
+<p><strong>Pushing a model to the Hub:</strong></p>
+<p><strong>Loading a model from the Hub:</strong></p>
+<p><strong>Extracting features and replacing the head:</strong></p>
+<p><strong>Saving and restoring full configuration:</strong></p>
+<p>All model parameters (both EEG-specific and model-specific such as
+dropout rates, activation functions, number of filters) are automatically
+saved to the Hub and restored when loading.</p>
+<p>See :ref:`load-pretrained-models` for a complete tutorial.</p>
+</section>
+</main>
+</div>
+## Citation
+Please cite both the original paper for this architecture (see the
+*References* section above) and braindecode:
+```bibtex
+@article{aristimunha2025braindecode,
+  title   = {Braindecode: a deep learning library for raw electrophysiological data},
+  author  = {Aristimunha, Bruno and others},
+  journal = {Zenodo},
+  year    = {2025},
+  doi     = {10.5281/zenodo.17699192},
+}
+```
+## License
+BSD-3-Clause for the model code (matching braindecode).
+Pretraining-derived weights, if you fine-tune from a checkpoint,
+inherit the licence of that checkpoint and its training corpus.