CortexFM / README.md

Add README.md (model card landing)

0c4cb38 verified about 1 month ago

13.3 kB

	---
	license: mit
	language:
	- en
	- ko
	tags:
	- brain-computer-interface
	- foundation-model
	- neural-decoding
	- transformer
	- cross-modal
	- bci
	- electrophysiology
	- emg
	- spike-trains
	pipeline_tag: feature-extraction
	library_name: pytorch
	metrics:
	- r-squared
	---

	<!-- Training data: DANDI Archive Dandiset 000941 (Monkey L motor cortex, 4 sessions, ~3h 38min). External dataset, not a Hugging Face Hub dataset. See model_card.md §Training Data for full details. -->


	# CortexFM — A Lightweight Multimodal Foundation Model for Spike + EMG BCI

	A 5.04 M-parameter multimodal Transformer foundation model that jointly learns spike trains and surface EMG envelopes from public DANDI motor-cortex data, evaluated on the FALCON M1 benchmark.

	CortexFM은 약 5.04 M 파라미터 규모의 다중모달 Transformer 파운데이션 모델로, 공개 DANDI 운동피질 데이터로부터 스파이크와 EMG 포락선을 공동 학습하고 FALCON M1 벤치마크 위에서 평가된 경량 BCI 백본이다.

	---

	## Model description

	CortexFM is a small, public, and fully reproducible foundation model for invasive brain–computer interface (BCI) decoding. It targets the regime where private million-hour pretraining data and 45 M – 350 M parameter backbones are not available, and asks how far we can push neural-decoding quality with ~3.85 hours of public data and a ~5 M-parameter model trained in about six minutes on a single consumer GPU.

	CortexFM은 (i) 단위(per-unit)/근육(per-muscle) 정체성 보존, (ii) 공개 데이터·공개 벤치마크만으로의 재현, (iii) FALCON 표준 정렬을 세 가지 설계 원리로 둔다. 백본은 10-layer × 6-head × d=192 PreNorm Transformer (4.45 M params, FLASH SDPA), 헤드는 spike Poisson NLL 재구성, EMG MSE 재구성, cross-modal InfoNCE 대조 학습의 세 갈래로 구성된다.

	### Architecture summary

	\| Component \| Configuration \|
	\|---\|---\|
	\| Backbone \| PreNorm Transformer, 10 layers, 6 heads, d_model = 192, FFN = 768, GELU \|
	\| Attention \| SDPA with FLASH / EFFICIENT backends (PyTorch 2.10) \|
	\| Backbone params \| 4,449,024 (≈ 4.45 M) \|
	\| Spike tokenizer \| Per-unit learned embedding ⊕ log(1 + α · count) + temporal positional embedding \|
	\| EMG tokenizer \| Per-muscle learned embedding ⊕ scalar-to-vector MLP + temporal positional embedding \|
	\| Heads \| Spike recon (Poisson NLL), EMG recon (MSE), Contrastive projector (d_p = 128) \|
	\| Total params \| 5,044,994 (≈ 5.04 M) \|
	\| Bin size \| 20 ms (FALCON official) \|
	\| Context length T \| 64 bins (1.28 s) → 1,088 tokens \|
	\| Mixed precision \| BF16 (InfoNCE softmax promoted to FP32) \|

	### Intended uses

	- Research-grade neural decoding of primate motor cortex (M1) spike trains into 16-channel surface/intramuscular EMG envelopes.
	- Backbone for downstream BCI probes: as a frozen feature extractor with a thin (~3 K-param) per-session output-space affine adapter, CortexFM enables session-1 adaptation to held-out recording days.
	- Cross-modal pretraining baseline for studies that compare per-unit tokenization against patch-tokenized BCI foundation models (e.g., NDT-3) at a 1/9 – 1/69 parameter ratio.
	- Educational reference for compact (~5 M-param) foundation-model training from public data on a single consumer GPU.

	### Out-of-scope uses

	- Clinical or assistive deployment. This is a research checkpoint trained on a single non-human primate (MonkeyL, DANDI 000941). It is not intended for human BCI control or medical decision-making.
	- Cross-subject generalization. The pretraining set is one subject; cross-subject transfer (e.g., MonkeyN, MC_Maze, human cortex) has not been validated.
	- Direct kinematic decoding. The model outputs EMG envelopes; downstream kinematic readouts require an additional decoding stage.
	- Real-time control without calibration. Held-out sessions require a brief (≥ ~8 s) per-session affine calibration to enter the positive-R² regime.

	---

	## Training data

	\| Dataset \| DOI \| Subject \| Modality \| Duration \|
	\|---\|---\|---\|---\|---\|
	\| DANDI:000941 (Rouse & Schieber 2018) \| [10.48324/dandi.000941/0.211015.0907](https://dandiarchive.org/dandiset/000941) \| MonkeyL (1 NHP) \| M1 spikes (64 units) + intramuscular EMG (16 muscles) \| 11 sessions total \|

	Pretraining uses the four held-in calibration sessions of DANDI 000941 (sessions 20120924, 20120926, 20120927, 20120928), totaling 3 h 38 min of paired spike + EMG recordings. The remaining 7 sessions (4 minival + 3 held-out calibration) are reserved for FALCON M1 evaluation and OOD session-1 adaptation.

	License: CC-BY-4.0 (DANDI public release).

	### Preprocessing pipeline

	- EMG: 60/180/200/300/400 Hz notch → 4th-order Butterworth high-pass at 65 Hz → rectify → 99 % clip → 95 % normalize → polyphase resample (1 kHz → 50 Hz) → re-rectify → 10 Hz low-pass envelope.
	- Spike: 20 ms bin counts per unit on the same time grid.
	- Output: Zarr store with `/emg/envelope`, `/spike/counts`, `/eval_mask`, and trial markers. Spike/EMG share a common 20 ms bin axis (FALCON official invariant).

	---

	## Training procedure

	### Objectives

	Joint loss with three components:

	$$
	\mathcal{L}_{\text{total}} = w_{\text{spike}} \cdot \mathcal{L}_{\text{spike}} + w_{\text{emg}} \cdot \mathcal{L}_{\text{emg}} + w_{\text{cont}} \cdot \mathcal{L}_{\text{cont}}
	$$

	- $\mathcal{L}_{\text{spike}}$: Poisson NLL over per-unit spike counts.
	- $\mathcal{L}_{\text{emg}}$: MSE on per-muscle EMG envelopes.
	- $\mathcal{L}_{\text{cont}}$: Symmetric InfoNCE on pooled cross-modal embeddings (FP32-promoted), temperature τ = 0.1.

	Loss weights $(w_{\text{spike}}, w_{\text{emg}}, w_{\text{cont}}) = (1.0, 1.0, 0.5)$.

	### Masking

	Spike and EMG tokens are independently masked at 50 % per bin. Either modality must be reconstructed from the unmasked complement of itself and the (independently masked) cross-modal signal.

	### Optimization

	\| Hyperparameter \| Value \|
	\|---\|---\|
	\| Optimizer \| AdamW \|
	\| Learning rate \| 3 × 10⁻⁴ \|
	\| Weight decay \| 0.01 \|
	\| LR schedule \| Linear warmup (500 steps) → cosine decay \|
	\| Batch size \| 8 \|
	\| Context length T \| 64 bins \|
	\| Mixed precision \| BF16 (InfoNCE softmax in FP32) \|
	\| Gradient clip \| 1.0 \|
	\| Max epochs \| 50 (early-best at epoch 28) \|

	### Training environment

	\| Item \| Value \|
	\|---\|---\|
	\| GPU \| NVIDIA RTX 5080 (16 GB GDDR7, sm_120) — single consumer card \|
	\| OS / runtime \| WSL2 Ubuntu 24.04 \|
	\| Framework \| PyTorch 2.10.0 + cu128, PyTorch Lightning \|
	\| Wall-clock training time \| ≈ 6 minutes for 30 epochs \|
	\| Best checkpoint \| `epoch28-0.2599.ckpt` (60.7 MB, val_loss = 0.2599) \|
	\| External cloud GPU \| None — fully on-device \|

	Train/val gap stayed below 0.03 throughout, so no early stopping was applied and the lowest-validation-loss checkpoint was kept verbatim.

	---

	## Evaluation

	### FALCON M1 (held-in calibration sessions, variance-weighted R² over 16 muscles)

	\| Setting \| Params (used) \| Per-session R² (mean ± std) \| Pooled R² \| NL \|
	\|---\|---\|---\|---\|---\|
	\| POYO-1 zero-EMG floor \| 15.47 M (0 used) \| −1.273 ± 0.299 \| — \| 2.4e-5 \|
	\| POYO-1 frozen + per-session affine \| 15.47 M + ~2 K/sess \| +0.451 ± 0.112 \| +0.498 \| — \|
	\| CortexFM zero-shot \| 5.04 M \| −1.035 ± 0.234 \| — \| 0.131 \|
	\| CortexFM frozen + Ridge linear probe \| 5.04 M + ~3 K \| −0.258 ± 0.327 \| +0.125 \| — \|
	\| CortexFM + EMG-head FT 200 step (ZS) \| 5.04 M (FT 37 K = 0.75 %) \| −0.038 ± 0.063 \| — \| — \|
	\| CortexFM frozen + per-session affine \| 5.04 M + ~3 K/sess \| +0.484 ± 0.102 \| +0.529 \| — \|

	NL = FALCON normalized latency (inference time / data duration).

	### Auxiliary co-bps (CortexFM only)

	Mean 0.756 ± 0.128 bits/spike above per-unit mean-rate baseline on the four held-in calibration files.

	### Held-out OOD calibration sessions (DANDI 000941, days +6 to +30, 3 sessions)

	Variance-weighted R², calibration ≈ 640 bins:

	\| Session \| CortexFM + affine \| POYO-1 + affine \| Δ (POYO-1 − CortexFM) \|
	\|---\|---\|---\|---\|
	\| 20121004 \| +0.4443 \| −0.0209 \| −0.4652 \|
	\| 20121017 \| +0.2730 \| −0.2326 \| −0.5056 \|
	\| 20121024 \| +0.4046 \| +0.1824 \| −0.2222 \|
	\| Per-session mean ± std \| +0.374 ± 0.073 \| −0.024 ± 0.169 \| −0.398 \|
	\| Pooled R² \| +0.387 \| −0.008 \| −0.395 \|

	The decisive separation between CortexFM and POYO-1 emerges in the OOD held-out sessions: the held-in gap is small (Δ = +0.031 in CortexFM's favor), but the held-out gap reaches Δ = +0.395 pooled R² — a gap attributable to backbone representation quality rather than the affine adapter recipe (both backbones use the identical per-session output-space affine).

	### Why zero-shot R² is negative

	Three factors documented in the thesis (Chapter 6):
	1. Objective mismatch: pretraining minimizes joint masked-recon + InfoNCE, whereas FALCON M1 measures EMG-only regression.
	2. Inference-time input shift: EMG is the prediction target at evaluation time, so the EMG tokenizer is fed zeros — out of pretraining distribution.
	3. Absence of per-session linear correction: standard FALCON pipelines fit a shallow regressor per session; CortexFM zero-shot does not.

	The Ridge linear probe resolves factor (3) (pooled R² entering the positive regime at +0.125); EMG-head fine-tuning resolves factors (1)+(2) (per-session R² up to −0.038); per-session affine resolves all three jointly (pooled R² = +0.529).

	---

	## Limitations

	1. Single-subject pretraining. Pretraining is restricted to MonkeyL (DANDI 000941). Cross-subject transfer to MonkeyN, MC_Maze, or human cortex is not validated.
	2. n = 3 OOD sessions. The held-out evaluation uses three sessions; effect sizes are large but formal Holm-corrected statistical power is limited.
	3. Calibration dependence on OOD. With fewer than ~400 calibration bins (< 8 s) on a held-out session, OOD R² becomes unstable. Real-time deployment therefore requires a brief calibration cycle per session.
	4. EMG-only readout. The model decodes 16-channel EMG envelopes, not kinematics directly. A downstream kinematic stage is needed for end-effector control.
	5. No clinical validation. The model is research-grade. It has not been evaluated for safety, robustness, or efficacy in any clinical BCI setting and must not be used as such.

	---

	## How to use

	```python
	import torch
	from cortex_fm.training import CortexFMPretrainModule

	# Load checkpoint
	module = CortexFMPretrainModule.load_from_checkpoint(
	"epoch28-0.2599.ckpt",
	map_location="cuda",
	strict=True,
	)
	module.eval()

	# Inference: spike counts -> EMG envelope
	# spike_counts: (B, T=64, N=64) int
	# emg_placeholder: (B, T=64, M=16) float (zeros at inference)
	spike_counts = torch.zeros(1, 64, 64, dtype=torch.long, device="cuda")
	emg_placeholder = torch.zeros(1, 64, 16, device="cuda")

	with torch.no_grad():
	out = module(spike_counts, emg_placeholder)

	emg_pred = out["emg_pred"].view(1, 64, 16)[:, -1, :] # (B, 16) at last bin
	log_rate = out["log_rate"] # (B, T, 64) Poisson log-rates
	```

	For FALCON M1 evaluation, see `benchmark_wrapper/` and the `CortexFMFalconDecoder` reference implementation.

	---

	## Citation

	```bibtex
	@mastersthesis{shin2026cortexfm,
	author = {Shin, Jaeguk},
	title = {{CortexFM}: A Lightweight Multimodal Foundation Model for Spike--EMG Decoding on Public Brain--Computer Interface Data},
	school = {Dong-eui University},
	type = {{M.S.} thesis},
	year = {2026},
	month = jun,
	address = {Busan, Republic of Korea},
	}
	```

	If you also use the FALCON benchmark, please cite Karpowicz et al. 2024.

	---

	## Ethical considerations

	CortexFM is a research artifact. The following points apply:

	- Animal data. Pretraining data come from a single non-human primate recorded under the original Rouse & Schieber 2018 protocols (DANDI 000941, CC-BY-4.0). No additional animal experiments were conducted for this release.
	- No human data. The released checkpoint has not been trained or evaluated on human neural recordings.
	- Dual-use awareness. Invasive BCI decoding can in principle inform assistive devices or surveillance / commercial neuro-monitoring systems. The author releases this checkpoint to support open scientific reproduction and lightweight benchmarking; downstream users are responsible for ensuring their applications respect informed consent, neural privacy, and applicable medical-device regulation.
	- No clinical claims. CortexFM has not been evaluated against clinical-grade BCIs and must not be deployed in patient-facing systems without full regulatory validation.

	---

	## Model details

	- Developed by: Jaeguk Shin (신재국), Dong-eui University, Department of Artificial Intelligence — M.S. thesis (June 2026), advised by faculty of Dong-eui University AI Department.
	- Model type: Multimodal Transformer foundation model (spike + EMG).
	- Language: N/A (the inputs are neural signals; the model card is bilingual EN/KO).
	- License: MIT (see `LICENSE`).
	- Finetuned from: Trained from scratch.
	- Related links: Thesis full text and reproducibility scripts to be released at the GitHub companion repository.