Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,123 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- self-supervised-learning
|
| 4 |
+
- contrastive-learning
|
| 5 |
+
- neuroimaging
|
| 6 |
+
- fMRI
|
| 7 |
+
- rs-fMRI
|
| 8 |
+
- neuroscience
|
| 9 |
+
- representation-learning
|
| 10 |
+
- domain-generalization
|
| 11 |
+
- transfer-learning
|
| 12 |
+
- pytorch
|
| 13 |
+
library_name: transformers
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# NeuroCLR
|
| 17 |
+
|
| 18 |
+
**NeuroCLR** is a self-supervised learning (SSL) framework for learning **robust, disorder-agnostic neural representations** from raw, unlabeled resting-state fMRI (rs-fMRI) regional time series. NeuroCLR is designed for **multi-site generalization** and **transfer** to downstream disorder classification with limited labeled data.
|
| 19 |
+
|
| 20 |
+
\[[GitHub Repo](https://github.com/pcdslab/NeuroCLR)\] | \[[Cite](#citation)\]
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
## Abstract
|
| 25 |
+
|
| 26 |
+
Self-supervised learning (SSL) is a powerful technique in computer vision for drastically reducing the dependency on large amounts of labeled training data. The availability of large-scale, unannotated, rs-fMRI data provides opportunities for the development of superior machine-learning models for classification of disorders across heterogeneous sites, and diverse subjects. In this paper, we propose NeuroCLR, a novel self-supervised learning (SSL) framework. NeuroCLR extracts robust and rich invariant neural representations - consistent across diverse experimental subjects and disorders - using contrastive principles, spatially constrained learning, and augmented views of unlabeled raw fMRI time series data. We pre-trained NeuroCLR using a combination of heterogeneous disorders from more than 3,600 participants across 44 different sites, and 720,000 region-specific time series fMRI data. The resultant disorder-agnostic pre-trained model is fine-tuned for downstream disorder-specific classification tasks on limited labelled data. We evaluate NeuroCLR on diverse disorder classification tasks and find that it outperforms both deep-learning, and SSL models that have been trained on a single disorder. Experiments also confirmed robust generalizability, consistently outperforming baselines across neuroimaging sites. This study is the first to present robust and reproducible self-supervised methodology with anatomically consistent contrastive objective that operates on raw unlabelled fMRI data, capable of reliable transfer across diagnostic categories. This will cultivate stronger participation by computational and clinical researchers, setting the stage for the development of sophisticated diagnostic models, for various neurodegenerative and neurodevelopmental disorders, leveraging NeuroCLR.
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
+
## Model Contents (Subfolders)
|
| 31 |
+
|
| 32 |
+
This repository provides two loadable model artifacts:
|
| 33 |
+
|
| 34 |
+
- **`pretraining/`**: SSL encoder + projector (contrastive pre-training)
|
| 35 |
+
- **`classification/`**: Encoder + ResNet1D classification head (downstream classification)
|
| 36 |
+
|
| 37 |
+
Both subfolders use **custom code** (Transformer encoder + projector; and a ResNet1D head), so you must set `trust_remote_code=True` when loading.
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## Model Details
|
| 42 |
+
|
| 43 |
+
### 1) Pretraining model (`pretraining/`)
|
| 44 |
+
- **Input**: region-wise rs-fMRI time series
|
| 45 |
+
Shape: **`[B, 1, 128]`** (batch, sequence length, embedding dimension)
|
| 46 |
+
- **Output**:
|
| 47 |
+
- `h`: pooled representation, shape **`[B, 128]`**
|
| 48 |
+
- `z`: projected representation, shape **`[B, projector_out2]`**
|
| 49 |
+
|
| 50 |
+
### 2) Classification model (`classification/`)
|
| 51 |
+
- **Input**: ROI-by-time representation (e.g., 200 ROIs)
|
| 52 |
+
Shape: **`[B, 200, 128]`**
|
| 53 |
+
- **Output**:
|
| 54 |
+
- `logits`: shape **`[B, num_labels]`**
|
| 55 |
+
- `loss`: returned when labels are provided
|
| 56 |
+
|
| 57 |
+
**Note on freezing**: By default, the encoder is packaged with the classification model. Depending on the configuration used at training/export time, the encoder may be frozen for downstream training (recommended). See the GitHub repo for training scripts and details.
|
| 58 |
+
|
| 59 |
+
---
|
| 60 |
+
|
| 61 |
+
## Model Usage
|
| 62 |
+
|
| 63 |
+
### Requirements
|
| 64 |
+
|
| 65 |
+
```bash
|
| 66 |
+
pip install torch transformers huggingface_hub safetensors
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
### Load the Pretraining Model (SSL Encoder)
|
| 70 |
+
|
| 71 |
+
```py
|
| 72 |
+
import torch
|
| 73 |
+
from transformers import AutoModel
|
| 74 |
+
|
| 75 |
+
model = AutoModel.from_pretrained(
|
| 76 |
+
"SaeedLab/NeuroCLR",
|
| 77 |
+
subfolder="pretraining",
|
| 78 |
+
trust_remote_code=True
|
| 79 |
+
)
|
| 80 |
+
|
| 81 |
+
model.eval()
|
| 82 |
+
|
| 83 |
+
x = torch.randn(4, 1, 128) # [batch, sequence_length, embedding_dim]
|
| 84 |
+
|
| 85 |
+
with torch.no_grad():
|
| 86 |
+
outputs = model(x)
|
| 87 |
+
|
| 88 |
+
print(outputs["h"].shape) # [4, 128]
|
| 89 |
+
print(outputs["z"].shape) # [4, projector_out2]
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
### Load the Downstream Classification Model
|
| 93 |
+
```py
|
| 94 |
+
import torch
|
| 95 |
+
from transformers import AutoModelForSequenceClassification
|
| 96 |
+
|
| 97 |
+
model = AutoModelForSequenceClassification.from_pretrained(
|
| 98 |
+
"SaeedLab/NeuroCLR",
|
| 99 |
+
subfolder="classification",
|
| 100 |
+
trust_remote_code=True
|
| 101 |
+
)
|
| 102 |
+
|
| 103 |
+
model.eval()
|
| 104 |
+
|
| 105 |
+
x = torch.randn(4, 200, 128) # [batch, n_rois, embedding_dim]
|
| 106 |
+
labels = torch.tensor([0, 1, 0, 1])
|
| 107 |
+
|
| 108 |
+
with torch.no_grad():
|
| 109 |
+
outputs = model(x, labels=labels)
|
| 110 |
+
|
| 111 |
+
print(outputs["logits"].shape) # [4, 2]
|
| 112 |
+
print(outputs["loss"])
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
+
## Citation
|
| 116 |
+
|
| 117 |
+
The paper is under review. As soon as it is accepted, we will update this section.
|
| 118 |
+
|
| 119 |
+
|
| 120 |
+
## Contact
|
| 121 |
+
|
| 122 |
+
For any additional questions or comments, contact Fahad Saeed (fsaeed@fiu.edu).
|
| 123 |
+
|