MnemoDyn / README.md

Update README.md

dc8f588 verified 2 days ago

6.72 kB

	# MnemoDyn: Learning Resting State Dynamics from 40K fMRI Sequences

	[[Paper]]() [[Poster]]() [[Slide]]()

	### Sourav Pal, Viet Luong, Hoseok Lee, Tingting Dan, Guorong Wu, Richard Davidson, Won Hwa Kim, Vikas Singh

	![MnemoDyn architecture](asset/braine-1.png)

	MnemoDyn is an operator-learning foundation model for resting-state fMRI, combining multi-resolution wavelet dynamics with CDE-style temporal modeling.

	## Update

	MnemoDyn is now published on Hugging Face: https://huggingface.co/vhluong/MnemoDyn

	You can also publish your own trained checkpoint directly from this repo.

	## Tutorial

	A usage walkthrough is available as a Google Colab notebook:

	[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1IeWYPmwZAj5zA_khQHmgKOXjF8DJXVNo?usp=sharing)

	## At A Glance

	- Pretraining backbones: `coe/light/model/main.py`, `coe/light/model/main_masked_autoencode.py`, `coe/light/model/main_masked_autoencode_jepa.py`, `coe/light/model/main_denoise.py`, `coe/light/model/orion.py`
	- Core model modules: `coe/light/model/conv1d_optimize.py`, `coe/light/model/dense_layer.py`, `coe/light/model/ema.py`, `coe/light/model/normalizer.py`
	- Downstream tasks: HBN, ADHD200, ADNI, ABIDE, NKIR, UK Biobank, HCP Aging under `coe/light/*.py`
	- Launch scripts: `coe/light/script/*.sh`

	## Repository Layout

	```text
	.
	├── highdim_req.txt
	├── pyproject.toml
	├── coe/
	│ ├── parcellation/
	│ └── light/
	│ ├── model/
	│ ├── script/
	│ ├── *_dataset.py
	│ └── classification.py, regress.py
	└── README.md
	```

	## Environment Setup

	Python 3.10+ is recommended.

	### Option A (recommended): uv

	```bash
	uv venv
	source .venv/bin/activate
	uv sync
	```

	### Option B: pip

	```bash
	python -m venv .venv
	source .venv/bin/activate
	pip install -r highdim_req.txt
	```

	Ensure your PyTorch build matches your CUDA stack.

	<!-- ## Data Processing Flow

	MnemoDyn expects parcellated rs-fMRI time series data (`*.dtseries.nii`) as input.

	If you are starting from volumetric NIfTI files (e.g., from fMRIPrep), you must run them through our Preprocessing Pipeline (described above) before training. This ensures proper alignment and time-step continuity.

	To use custom datasets:
	1. Preprocess your NIfTI files through `coe.preprocess.pipeline`.
	2. Ensure you have dataset metadata CSV/TSV files (labels, demographics, IDs).
	3. Update the hardcoded dataset paths (e.g., `/mnt/sourav/HBN_dtseries/`) in the downstream training launch scripts (`coe/light/script/*.sh`) to point to your new output directories. -->

	## Preprocessing Pipeline (NIfTI to Parcellated CIFTI)

	We provide a unified, Python-based CLI pipeline to automate mapping volumetric NIfTI images to fs_LR surfaces and parcellating the resulting dense time series. The pipeline dynamically extracts the Repetition Time (TR) from your NIfTI files to ensure downstream models learn accurate temporal dynamics.

	### Requirements
	- Connectome Workbench (`wb_command`) installed and on your system PATH.
	- `nibabel` and `tqdm` Python packages.

	### Usage
	Run the pipeline from the repository root:

	```bash
	python -m coe.preprocess.pipeline \
	--input-dir /path/to/niftis \
	--output-dir /path/to/output_dir \
	--atlas /path/to/atlas.dlabel.nii \
	--pattern "*_task-rest_space-MNI305_preproc.nii.gz"
	```

	The script will automatically orchestrate `wb_command` for left/right mapping and resampling, output an intermediate `.dtseries.nii`, and finally parcellate it using the provided atlas, injecting the correct native TR throughout.


	## Quick Start

	### 1) Inspect pretraining CLIs

	```bash
	cd coe/light/model
	python main.py --help
	python main_masked_autoencode.py --help
	python main_masked_autoencode_jepa.py --help
	python main_denoise.py --help
	```

	### 2) Pretraining

	```bash
	bash orion.sh
	```

	### 3) Run downstream examples

	```bash
	cd coe/light
	bash script/hbn_classification.sh
	bash script/adhd_200_diagnose.sh
	```

	<!-- ## Common Script Entry Points

	From `coe/light`:

	- `bash script/abide_classifcation_normal.sh`
	- `bash script/adhd_200_diagnose.sh`
	- `bash script/adhd_200_sex_classification.sh`
	- `bash script/adni_classification_amyloid.sh`
	- `bash script/adni_classification_sex.sh`
	- `bash script/hbn_classification.sh`
	- `bash script/hbn_regression.sh`
	- `bash script/hcp_aging_450.sh`
	- `bash script/hcp_aging_classification.sh`
	- `bash script/hcp_aging_regress_flanker.sh`
	- `bash script/hcp_aging_regress_neuroticism.sh`
	- `bash script/nkir_classification.sh`
	- `bash script/ukbiobank_age_regression.sh`
	- `bash script/ukbiobank_sex_classification.sh` -->

	## Typical Workflow

	1. Pretrain a foundation checkpoint (`coe/light/model/main*.py`).
	2. Save Lightning checkpoints under a versioned results directory.
	3. Fine-tune a downstream head using a task script in `coe/light/`.
	4. Track outputs and metrics under `Result/<ExperimentName>/...`.

	<!-- ## Publish to Hugging Face

	Install Hub client:

	```bash
	pip install huggingface_hub
	```

	Log in once:

	```bash
	huggingface-cli login
	```

	Publish a training run folder (auto-picks best checkpoint by lowest `val_mae` in filename):

	```bash
	python -m coe.light.model.publish_to_hf \
	--repo-id <your-hf-username>/<model-name> \
	--version-dir /path/to/version_17
	```

	Or publish an explicit checkpoint:

	```bash
	python -m coe.light.model.publish_to_hf \
	--repo-id <your-hf-username>/<model-name> \
	--checkpoint /path/to/model.ckpt \
	--hparams /path/to/hparams.yaml \
	--metrics /path/to/metrics.csv
	```

	Load it back:

	```python
	from huggingface_hub import hf_hub_download
	from coe.light.model.main import LitORionModelOptimized

	ckpt = hf_hub_download(repo_id="<your-hf-username>/<model-name>", filename="model.ckpt")
	model = LitORionModelOptimized.load_from_checkpoint(ckpt, map_location="cpu")
	model.eval()
	``` -->

	## Notes and Caveats

	- This is a research codebase and is still being consolidated.
	- Some scripts may require branch-specific import/path adjustments.
	- Normalization and dataset utilities are partially duplicated across modules.
	- Reproducibility depends on matching preprocessing, atlas/parcellation, and dataset splits.

	## Citation

	If this work helps your research, please cite:

	```bibtex
	@inproceedings{
	pal2026mnemodyn,
	title={MnemoDyn: Learning Resting State Dynamics from $40$K {FMRI} sequences},
	author={Sourav Pal and Viet Luong and Hoseok Lee and Tingting Dan and Guorong Wu and Richard Davidson and Won Hwa Kim and Vikas Singh},
	booktitle={The Fourteenth International Conference on Learning Representations},
	year={2026},
	url={https://openreview.net/forum?id=zexMILcQOV}
	}
	```

	---
	license: mit
	---