OmniMouse-80M

Pretrained OmniMouse: a multi-modal, multi-task transformer for the mouse visual cortex. Trained on 2.3M neurons from 73 mice across 323 sessions (150B+ neural tokens), unified across neural prediction, behavioral decoding, and neural forecasting.

Overview

Abstract

Scaling data and models has transformed AI. Does the same hold for brain modeling? We train multi-modal, multi-task models on 2.3 million neurons from 73 mice (150B+ neural tokens), flexibly supporting neural prediction, behavioral decoding, and neural forecasting. OmniMouse achieves state-of-the-art performance, outperforming specialized baselines across virtually all regimes. Yet performance scales with more data while gains from larger models saturate — inverting the standard AI scaling story: brain models remain data-limited even with vast recordings.

Usage

This repo stores the training checkpoint as per-rank shards (rank_*.ckpt), loaded through omnimouse.utils.ModelCheckpoint. The loader handles multi-rank → single-rank consolidation transparently.

from huggingface_hub import snapshot_download

# Download the rank_*.ckpt shards; returns the local cache path.
ckpt_dir = snapshot_download(
    repo_id="the-enigma-project/omnimouse-80M",
    allow_patterns=["rank_*.ckpt"],
)

Pass ckpt_dir as ckpt_path to the standard eval pipeline:

uv run eval experiment=omnimouse_80M ckpt_path=<ckpt_dir>

Or wire it into a notebook via notebooks/inference.ipynb — see that notebook for the end-to-end flow (config composition, dataloader setup, checkpoint load, evaluation).

The matching Hydra experiment config is configs/experiment/omnimouse_80M.yaml. The session-specific readouts (sess_params) in the checkpoint are tied to the training corpus; when fine-tuning to new sessions, load with strict=False so unmatched readouts initialize fresh.

Model family

Architecture

OmniMouse follows an encode–fuse–decode design. A cross-attention encoder with learned latent queries compresses unmasked neural and behavioral tokens into a fixed-length latent sequence, using local sliding-window attention plus a few global "register" tokens. A fusion stack then integrates these latents with video features through transformer layers that interleave sliding-window and global self-attention at a 5:1 ratio. All attention blocks use 1D RoPE derived from each token's timestamp, and a shared cross-attention decoder reads out neuronal and behavioral targets from the fused representation.

Training data

2.3M+ single neurons from the visual cortex of 73 mice across 323 sessions, totaling 150B+ neural tokens. Mice viewed naturalistic movies (cinematic clips, Sports-1M), ImageNet images, and parametric stimuli (Gabors, random dot kinematograms, pink noise, model-generated stimuli) while running on a wheel. Pupil position, pupil dilation and its derivative, and running speed were recorded throughout.

The dataset is available at the-enigma-project/omnimouse-dataset.

Benchmarks

Single-trial Pearson correlation on the held-out test set (full 323-session corpus).

Model Params Forecast Forecast + Stim Population Pop. + Stim Gaze Pupil Running
OmniMouse-80M 80M 0.25 0.34 0.29 0.37 0.8 0.76 0.75

Links

Citation

@inproceedings{willeke2026omnimouse,
  title     = {OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens},
  author    = {Konstantin Friedrich Willeke and Polina Turishcheva and Alex Gilbert and Goirik Chakrabarty and Hasan Atakan Bedel and Paul G. Fahey and Yongrong Qiu and Marissa A. Weis and Michaela Vystr{\v{c}}ilov{\'a} and Taliah Muhammad and Lydia Ntanavara and Rachel E Froebe and Kayla Ponder and Zheng Huan Tan and Emin Orhan and Erick Cobos and Sophia Sanborn and Katrin Franke and Fabian H. Sinz and Alexander S. Ecker and Andreas S. Tolias},
  booktitle = {The Fourteenth International Conference on Learning Representations},
  year      = {2026},
  url       = {https://openreview.net/forum?id=mEw4lhAn0F}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support