| --- |
| license: mit |
| language: |
| - en |
| - ko |
| tags: |
| - brain-computer-interface |
| - foundation-model |
| - neural-decoding |
| - transformer |
| - cross-modal |
| - bci |
| - electrophysiology |
| - emg |
| - spike-trains |
| pipeline_tag: feature-extraction |
| library_name: pytorch |
| metrics: |
| - r-squared |
| --- |
| |
| <!-- Training data: DANDI Archive Dandiset 000941 (Monkey L motor cortex, 4 sessions, ~3h 38min). External dataset, not a Hugging Face Hub dataset. See model_card.md §Training Data for full details. --> |
| |
| |
| # CortexFM — A Lightweight Multimodal Foundation Model for Spike + EMG BCI |
| |
| A 5.04 M-parameter multimodal Transformer foundation model that jointly learns spike trains and surface EMG envelopes from public DANDI motor-cortex data, evaluated on the FALCON M1 benchmark. |
| |
| CortexFM은 약 5.04 M 파라미터 규모의 다중모달 Transformer 파운데이션 모델로, 공개 DANDI 운동피질 데이터로부터 스파이크와 EMG 포락선을 공동 학습하고 FALCON M1 벤치마크 위에서 평가된 경량 BCI 백본이다. |
| |
| --- |
| |
| ## Model description |
| |
| CortexFM is a small, public, and fully reproducible foundation model for invasive brain–computer interface (BCI) decoding. It targets the regime where private million-hour pretraining data and 45 M – 350 M parameter backbones are *not* available, and asks how far we can push neural-decoding quality with **~3.85 hours of public data** and a **~5 M-parameter** model trained in **about six minutes on a single consumer GPU**. |
| |
| CortexFM은 (i) 단위(per-unit)/근육(per-muscle) 정체성 보존, (ii) 공개 데이터·공개 벤치마크만으로의 재현, (iii) FALCON 표준 정렬을 세 가지 설계 원리로 둔다. 백본은 10-layer × 6-head × d=192 PreNorm Transformer (4.45 M params, FLASH SDPA), 헤드는 spike Poisson NLL 재구성, EMG MSE 재구성, cross-modal InfoNCE 대조 학습의 세 갈래로 구성된다. |
| |
| ### Architecture summary |
| |
| | Component | Configuration | |
| |---|---| |
| | Backbone | PreNorm Transformer, 10 layers, 6 heads, d_model = 192, FFN = 768, GELU | |
| | Attention | SDPA with FLASH / EFFICIENT backends (PyTorch 2.10) | |
| | Backbone params | 4,449,024 (≈ 4.45 M) | |
| | Spike tokenizer | Per-unit learned embedding ⊕ log(1 + α · count) + temporal positional embedding | |
| | EMG tokenizer | Per-muscle learned embedding ⊕ scalar-to-vector MLP + temporal positional embedding | |
| | Heads | Spike recon (Poisson NLL), EMG recon (MSE), Contrastive projector (d_p = 128) | |
| | Total params | 5,044,994 (≈ 5.04 M) | |
| | Bin size | 20 ms (FALCON official) | |
| | Context length T | 64 bins (1.28 s) → 1,088 tokens | |
| | Mixed precision | BF16 (InfoNCE softmax promoted to FP32) | |
| |
| ### Intended uses |
| |
| - **Research-grade neural decoding** of primate motor cortex (M1) spike trains into 16-channel surface/intramuscular EMG envelopes. |
| - **Backbone for downstream BCI probes**: as a frozen feature extractor with a thin (~3 K-param) per-session output-space affine adapter, CortexFM enables session-1 adaptation to held-out recording days. |
| - **Cross-modal pretraining baseline** for studies that compare per-unit tokenization against patch-tokenized BCI foundation models (e.g., NDT-3) at a 1/9 – 1/69 parameter ratio. |
| - **Educational reference** for compact (~5 M-param) foundation-model training from public data on a single consumer GPU. |
| |
| ### Out-of-scope uses |
| |
| - **Clinical or assistive deployment**. This is a research checkpoint trained on a single non-human primate (MonkeyL, DANDI 000941). It is **not** intended for human BCI control or medical decision-making. |
| - **Cross-subject generalization**. The pretraining set is one subject; cross-subject transfer (e.g., MonkeyN, MC_Maze, human cortex) has not been validated. |
| - **Direct kinematic decoding**. The model outputs EMG envelopes; downstream kinematic readouts require an additional decoding stage. |
| - **Real-time control without calibration**. Held-out sessions require a brief (≥ ~8 s) per-session affine calibration to enter the positive-R² regime. |
|
|
| --- |
|
|
| ## Training data |
|
|
| | Dataset | DOI | Subject | Modality | Duration | |
| |---|---|---|---|---| |
| | **DANDI:000941** (Rouse & Schieber 2018) | [10.48324/dandi.000941/0.211015.0907](https://dandiarchive.org/dandiset/000941) | MonkeyL (1 NHP) | M1 spikes (64 units) + intramuscular EMG (16 muscles) | 11 sessions total | |
|
|
| Pretraining uses the **four held-in calibration sessions** of DANDI 000941 (sessions 20120924, 20120926, 20120927, 20120928), totaling **3 h 38 min** of paired spike + EMG recordings. The remaining 7 sessions (4 minival + 3 held-out calibration) are reserved for FALCON M1 evaluation and OOD session-1 adaptation. |
|
|
| License: CC-BY-4.0 (DANDI public release). |
|
|
| ### Preprocessing pipeline |
|
|
| - **EMG**: 60/180/200/300/400 Hz notch → 4th-order Butterworth high-pass at 65 Hz → rectify → 99 % clip → 95 % normalize → polyphase resample (1 kHz → 50 Hz) → re-rectify → 10 Hz low-pass envelope. |
| - **Spike**: 20 ms bin counts per unit on the same time grid. |
| - Output: Zarr store with `/emg/envelope`, `/spike/counts`, `/eval_mask`, and trial markers. Spike/EMG share a common 20 ms bin axis (FALCON official invariant). |
|
|
| --- |
|
|
| ## Training procedure |
|
|
| ### Objectives |
|
|
| Joint loss with three components: |
|
|
| $$ |
| \mathcal{L}_{\text{total}} = w_{\text{spike}} \cdot \mathcal{L}_{\text{spike}} + w_{\text{emg}} \cdot \mathcal{L}_{\text{emg}} + w_{\text{cont}} \cdot \mathcal{L}_{\text{cont}} |
| $$ |
| |
| - $\mathcal{L}_{\text{spike}}$: Poisson NLL over per-unit spike counts. |
| - $\mathcal{L}_{\text{emg}}$: MSE on per-muscle EMG envelopes. |
| - $\mathcal{L}_{\text{cont}}$: Symmetric InfoNCE on pooled cross-modal embeddings (FP32-promoted), temperature τ = 0.1. |
|
|
| Loss weights $(w_{\text{spike}}, w_{\text{emg}}, w_{\text{cont}}) = (1.0, 1.0, 0.5)$. |
| |
| ### Masking |
| |
| Spike and EMG tokens are independently masked at **50 %** per bin. Either modality must be reconstructed from the unmasked complement of itself and the (independently masked) cross-modal signal. |
| |
| ### Optimization |
| |
| | Hyperparameter | Value | |
| |---|---| |
| | Optimizer | AdamW | |
| | Learning rate | 3 × 10⁻⁴ | |
| | Weight decay | 0.01 | |
| | LR schedule | Linear warmup (500 steps) → cosine decay | |
| | Batch size | 8 | |
| | Context length T | 64 bins | |
| | Mixed precision | BF16 (InfoNCE softmax in FP32) | |
| | Gradient clip | 1.0 | |
| | Max epochs | 50 (early-best at epoch 28) | |
| |
| ### Training environment |
| |
| | Item | Value | |
| |---|---| |
| | GPU | NVIDIA RTX 5080 (16 GB GDDR7, sm_120) — single consumer card | |
| | OS / runtime | WSL2 Ubuntu 24.04 | |
| | Framework | PyTorch 2.10.0 + cu128, PyTorch Lightning | |
| | Wall-clock training time | **≈ 6 minutes** for 30 epochs | |
| | Best checkpoint | `epoch28-0.2599.ckpt` (60.7 MB, val_loss = 0.2599) | |
| | External cloud GPU | None — fully on-device | |
| |
| Train/val gap stayed below 0.03 throughout, so no early stopping was applied and the lowest-validation-loss checkpoint was kept verbatim. |
| |
| --- |
| |
| ## Evaluation |
| |
| ### FALCON M1 (held-in calibration sessions, variance-weighted R² over 16 muscles) |
| |
| | Setting | Params (used) | Per-session R² (mean ± std) | Pooled R² | NL | |
| |---|---|---|---|---| |
| | POYO-1 zero-EMG floor | 15.47 M (0 used) | −1.273 ± 0.299 | — | 2.4e-5 | |
| | POYO-1 frozen + per-session affine | 15.47 M + ~2 K/sess | +0.451 ± 0.112 | +0.498 | — | |
| | **CortexFM zero-shot** | **5.04 M** | **−1.035 ± 0.234** | — | **0.131** | |
| | **CortexFM frozen + Ridge linear probe** | **5.04 M + ~3 K** | **−0.258 ± 0.327** | **+0.125** | — | |
| | **CortexFM + EMG-head FT 200 step (ZS)** | **5.04 M (FT 37 K = 0.75 %)** | **−0.038 ± 0.063** | — | — | |
| | **CortexFM frozen + per-session affine** | **5.04 M + ~3 K/sess** | **+0.484 ± 0.102** | **+0.529** | — | |
| |
| NL = FALCON normalized latency (inference time / data duration). |
| |
| ### Auxiliary co-bps (CortexFM only) |
| |
| Mean 0.756 ± 0.128 bits/spike above per-unit mean-rate baseline on the four held-in calibration files. |
| |
| ### Held-out OOD calibration sessions (DANDI 000941, days +6 to +30, 3 sessions) |
| |
| Variance-weighted R², calibration ≈ 640 bins: |
| |
| | Session | CortexFM + affine | POYO-1 + affine | Δ (POYO-1 − CortexFM) | |
| |---|---|---|---| |
| | 20121004 | +0.4443 | −0.0209 | −0.4652 | |
| | 20121017 | +0.2730 | −0.2326 | −0.5056 | |
| | 20121024 | +0.4046 | +0.1824 | −0.2222 | |
| | Per-session mean ± std | **+0.374 ± 0.073** | −0.024 ± 0.169 | −0.398 | |
| | Pooled R² | **+0.387** | −0.008 | **−0.395** | |
| |
| The decisive separation between CortexFM and POYO-1 emerges in the OOD held-out sessions: the held-in gap is small (Δ = +0.031 in CortexFM's favor), but the held-out gap reaches **Δ = +0.395 pooled R²** — a gap attributable to backbone representation quality rather than the affine adapter recipe (both backbones use the *identical* per-session output-space affine). |
| |
| ### Why zero-shot R² is negative |
| |
| Three factors documented in the thesis (Chapter 6): |
| 1. **Objective mismatch**: pretraining minimizes joint masked-recon + InfoNCE, whereas FALCON M1 measures EMG-only regression. |
| 2. **Inference-time input shift**: EMG is the prediction target at evaluation time, so the EMG tokenizer is fed zeros — out of pretraining distribution. |
| 3. **Absence of per-session linear correction**: standard FALCON pipelines fit a shallow regressor per session; CortexFM zero-shot does not. |
| |
| The Ridge linear probe resolves factor (3) (pooled R² entering the positive regime at +0.125); EMG-head fine-tuning resolves factors (1)+(2) (per-session R² up to −0.038); per-session affine resolves all three jointly (pooled R² = +0.529). |
| |
| --- |
| |
| ## Limitations |
| |
| 1. **Single-subject pretraining**. Pretraining is restricted to MonkeyL (DANDI 000941). Cross-subject transfer to MonkeyN, MC_Maze, or human cortex is not validated. |
| 2. **n = 3 OOD sessions**. The held-out evaluation uses three sessions; effect sizes are large but formal Holm-corrected statistical power is limited. |
| 3. **Calibration dependence on OOD**. With fewer than ~400 calibration bins (< 8 s) on a held-out session, OOD R² becomes unstable. Real-time deployment therefore requires a brief calibration cycle per session. |
| 4. **EMG-only readout**. The model decodes 16-channel EMG envelopes, not kinematics directly. A downstream kinematic stage is needed for end-effector control. |
| 5. **No clinical validation**. The model is research-grade. It has not been evaluated for safety, robustness, or efficacy in any clinical BCI setting and must not be used as such. |
|
|
| --- |
|
|
| ## How to use |
|
|
| ```python |
| import torch |
| from cortex_fm.training import CortexFMPretrainModule |
| |
| # Load checkpoint |
| module = CortexFMPretrainModule.load_from_checkpoint( |
| "epoch28-0.2599.ckpt", |
| map_location="cuda", |
| strict=True, |
| ) |
| module.eval() |
| |
| # Inference: spike counts -> EMG envelope |
| # spike_counts: (B, T=64, N=64) int |
| # emg_placeholder: (B, T=64, M=16) float (zeros at inference) |
| spike_counts = torch.zeros(1, 64, 64, dtype=torch.long, device="cuda") |
| emg_placeholder = torch.zeros(1, 64, 16, device="cuda") |
| |
| with torch.no_grad(): |
| out = module(spike_counts, emg_placeholder) |
| |
| emg_pred = out["emg_pred"].view(1, 64, 16)[:, -1, :] # (B, 16) at last bin |
| log_rate = out["log_rate"] # (B, T, 64) Poisson log-rates |
| ``` |
|
|
| For FALCON M1 evaluation, see `benchmark_wrapper/` and the `CortexFMFalconDecoder` reference implementation. |
|
|
| --- |
|
|
| ## Citation |
|
|
| ```bibtex |
| @mastersthesis{shin2026cortexfm, |
| author = {Shin, Jaeguk}, |
| title = {{CortexFM}: A Lightweight Multimodal Foundation Model for Spike--EMG Decoding on Public Brain--Computer Interface Data}, |
| school = {Dong-eui University}, |
| type = {{M.S.} thesis}, |
| year = {2026}, |
| month = jun, |
| address = {Busan, Republic of Korea}, |
| } |
| ``` |
|
|
| If you also use the FALCON benchmark, please cite Karpowicz et al. 2024. |
|
|
| --- |
|
|
| ## Ethical considerations |
|
|
| CortexFM is a research artifact. The following points apply: |
|
|
| - **Animal data**. Pretraining data come from a single non-human primate recorded under the original Rouse & Schieber 2018 protocols (DANDI 000941, CC-BY-4.0). No additional animal experiments were conducted for this release. |
| - **No human data**. The released checkpoint has *not* been trained or evaluated on human neural recordings. |
| - **Dual-use awareness**. Invasive BCI decoding can in principle inform assistive devices or surveillance / commercial neuro-monitoring systems. The author releases this checkpoint to support open scientific reproduction and lightweight benchmarking; downstream users are responsible for ensuring their applications respect informed consent, neural privacy, and applicable medical-device regulation. |
| - **No clinical claims**. CortexFM has not been evaluated against clinical-grade BCIs and must not be deployed in patient-facing systems without full regulatory validation. |
|
|
| --- |
|
|
| ## Model details |
|
|
| - **Developed by**: Jaeguk Shin (신재국), Dong-eui University, Department of Artificial Intelligence — M.S. thesis (June 2026), advised by faculty of Dong-eui University AI Department. |
| - **Model type**: Multimodal Transformer foundation model (spike + EMG). |
| - **Language**: N/A (the inputs are neural signals; the model card is bilingual EN/KO). |
| - **License**: MIT (see `LICENSE`). |
| - **Finetuned from**: Trained from scratch. |
| - **Related links**: Thesis full text and reproducibility scripts to be released at the GitHub companion repository. |
|
|