falmuqhim commited on
Commit
98d2aa0
·
verified ·
1 Parent(s): 904aea7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -78
README.md CHANGED
@@ -15,132 +15,91 @@ library_name: transformers
15
 
16
  # NeuroCLR
17
 
18
- **NeuroCLR** is a self-supervised learning (SSL) framework for learning **robust, disorder-agnostic neural representations** from raw, unlabeled resting-state fMRI (rs-fMRI) regional time series. NeuroCLR is designed for **multi-site generalization** and **transfer** to downstream disorder classification with limited labeled data.
19
 
20
- \[[GitHub Repo](https://github.com/pcdslab/NeuroCLR)\] | \[[Cite](#citation)\]
21
 
22
  ---
23
 
24
  ## Abstract
25
 
26
- Self-supervised learning (SSL) is a powerful technique in computer vision for drastically reducing the dependency on large amounts of labeled training data. The availability of large-scale, unannotated, rs-fMRI data provides opportunities for the development of superior machine-learning models for classification of disorders across heterogeneous sites, and diverse subjects. In this paper, we propose NeuroCLR, a novel self-supervised learning (SSL) framework. NeuroCLR extracts robust and rich invariant neural representations - consistent across diverse experimental subjects and disorders - using contrastive principles, spatially constrained learning, and augmented views of unlabeled raw fMRI time series data. We pre-trained NeuroCLR using a combination of heterogeneous disorders from more than 3,600 participants across 44 different sites, and 720,000 region-specific time series fMRI data. The resultant disorder-agnostic pre-trained model is fine-tuned for downstream disorder-specific classification tasks on limited labelled data. We evaluate NeuroCLR on diverse disorder classification tasks and find that it outperforms both deep-learning, and SSL models that have been trained on a single disorder. Experiments also confirmed robust generalizability, consistently outperforming baselines across neuroimaging sites. This study is the first to present robust and reproducible self-supervised methodology with anatomically consistent contrastive objective that operates on raw unlabelled fMRI data, capable of reliable transfer across diagnostic categories. This will cultivate stronger participation by computational and clinical researchers, setting the stage for the development of sophisticated diagnostic models, for various neurodegenerative and neurodevelopmental disorders, leveraging NeuroCLR.
 
 
 
 
27
 
28
  ---
29
 
30
- ## Model Contents (Subfolders)
31
 
32
- This repository provides two loadable model artifacts:
33
 
34
- - **`pretraining/`**: SSL encoder + projector (contrastive pre-training)
35
- - **`classification/`**: Encoder + ResNet1D classification head (downstream classification)
36
 
37
- Both subfolders use **custom code** (Transformer encoder + projector; and a ResNet1D head), so you must set `trust_remote_code=True` when loading.
 
 
 
38
 
39
  ---
40
 
41
  ## Model Details
42
 
43
- ### 1) Pretraining model (`pretraining/`)
 
44
  - **Input**: region-wise rs-fMRI time series
45
- Shape: **`[B, 1, 128]`** (batch, sequence length, embedding dimension)
46
  - **Output**:
47
  - `h`: pooled representation, shape **`[B, 128]`**
48
- - `z`: projected representation, shape **`[B, projector_out2]`**
49
-
50
- ### 2) Classification model (`classification/`)
51
- - **Input**: ROI-by-time representation (e.g., 200 ROIs)
52
- Shape: **`[B, 200, 128]`**
53
- - **Output**:
54
- - `logits`: shape **`[B, num_labels]`**
55
- - `loss`: returned when labels are provided
56
 
57
- **Note on freezing**: By default, the encoder is packaged with the classification model. Depending on the configuration used at training/export time, the encoder may be frozen for downstream training (recommended). See the GitHub repo for training scripts and details.
 
 
 
58
 
59
  ---
60
 
61
- ## Usage (PyTorch)
62
 
63
- ```python
64
- import torch
65
- from transformers import AutoModel
66
-
67
- model = AutoModel.from_pretrained(
68
- "SaeedLab/NeuroCLR",
69
- subfolder="pretraining",
70
- trust_remote_code=True
71
- )
72
-
73
- model.eval()
74
-
75
- x = torch.randn(4, 1, 128) # [batch, seq_len, feature_dim]
76
-
77
- with torch.no_grad():
78
- outputs = model(x)
79
-
80
- print(outputs["h"].shape)
81
- print(outputs["z"].shape)
82
- ```
83
 
84
- ## Model Usage
 
 
85
 
86
- ### Requirements
87
 
88
- ```bash
89
- pip install torch transformers huggingface_hub safetensors
90
- ```
91
 
92
  ### Load the Pretraining Model (SSL Encoder)
93
 
94
- ```py
95
  import torch
96
  from transformers import AutoModel
97
 
98
  model = AutoModel.from_pretrained(
99
  "SaeedLab/NeuroCLR",
100
- subfolder="pretraining",
101
  trust_remote_code=True
102
  )
103
 
104
  model.eval()
105
 
106
- x = torch.randn(4, 1, 128) # [batch, sequence_length, embedding_dim]
107
 
108
  with torch.no_grad():
109
  outputs = model(x)
110
 
111
  print(outputs["h"].shape) # [4, 128]
112
- print(outputs["z"].shape) # [4, projector_out2]
113
- ```
114
-
115
- This repository is public, and no Hugging Face authentication is required to download or use the model.
116
-
117
- Users may see a warning when accessing the model without authentication. This warning is harmless and can be safely ignored.
118
-
119
- However, users may optionally authenticate using their own Hugging Face account by passing their access_token as follows:
120
-
121
- ```py
122
- import torch
123
- from transformers import AutoModel
124
-
125
- model = AutoModel.from_pretrained(
126
- "SaeedLab/NeuroCLR",
127
- subfolder="pretraining",
128
- trust_remote_code=True,
129
- access_token=your_huggingface_token
130
- )
131
-
132
- model.eval()
133
-
134
- x = torch.randn(4, 1, 128) # [batch, sequence_length, embedding_dim]
135
-
136
- with torch.no_grad():
137
- outputs = model(x)
138
-
139
- print(outputs["h"].shape) # [4, 128]
140
- print(outputs["z"].shape) # [4, projector_out2]
141
  ```
142
 
143
-
144
  ### Load the Downstream Classification Model
145
  ```py
146
  import torch
 
15
 
16
  # NeuroCLR
17
 
18
+ **NeuroCLR** is a self-supervised learning (SSL) framework for learning **robust, disorder-agnostic neural representations** from raw, unlabeled resting-state fMRI (rs-fMRI) regional time-series data. NeuroCLR is designed for **multi-site generalization** and **transfer** to downstream disorder classification tasks with limited labeled data.
19
 
20
+ [[GitHub Repo](https://github.com/pcdslab/NeuroCLR)] | [[Cite](#citation)]
21
 
22
  ---
23
 
24
  ## Abstract
25
 
26
+ Self-supervised learning (SSL) is a powerful technique for reducing dependence on large labeled datasets. The availability of large-scale, unannotated rs-fMRI data provides opportunities to develop robust machine-learning models for classification across heterogeneous sites and diverse cohorts.
27
+
28
+ In this work, we propose **NeuroCLR**, a novel SSL framework that learns invariant neural representations using contrastive objectives, spatial constraints, and augmented views of raw fMRI time-series data. NeuroCLR is pre-trained on data from more than 3,600 participants across 44 sites, comprising over 720,000 region-specific fMRI time series.
29
+
30
+ The resulting disorder-agnostic foundation model is fine-tuned for downstream classification tasks with limited labeled data and consistently outperforms both supervised deep-learning and SSL models trained on single disorders. NeuroCLR demonstrates strong cross-site generalizability and reliable transfer across diagnostic categories, enabling reproducible and scalable neuroimaging representation learning.
31
 
32
  ---
33
 
34
+ ## Model Structure
35
 
36
+ This repository provides **two loadable model artifacts**:
37
 
38
+ - **Root model (default)**
39
+ Self-supervised **pretraining encoder + projector** (contrastive SSL)
40
 
41
+ - **`classification/` subfolder**
42
+ Encoder + **ResNet1D classification head** for downstream tasks
43
+
44
+ All models rely on **custom architectures**, so `trust_remote_code=True` is required.
45
 
46
  ---
47
 
48
  ## Model Details
49
 
50
+ ### 1) Pretraining Model (Default, Loaded from Repo Root)
51
+
52
  - **Input**: region-wise rs-fMRI time series
53
+ Shape: **`[B, 1, L]`**, where `L = 128` time points
54
  - **Output**:
55
  - `h`: pooled representation, shape **`[B, 128]`**
56
+ - `z`: projected representation, shape **`[B, projector_out_dim]`**
 
 
 
 
 
 
 
57
 
58
+ This model is intended for:
59
+ - representation learning
60
+ - feature extraction
61
+ - transfer learning
62
 
63
  ---
64
 
65
+ ### 2) Classification Model (`classification/`)
66
 
67
+ - **Input**: ROI-by-time representation
68
+ Shape: **`[B, N_ROIs, 128]`** (e.g., `N_ROIs = 200`)
69
+ - **Output**:
70
+ - `logits`: shape **`[B, num_labels]`**
71
+ - `loss`: returned when labels are provided
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
+ > **Note**
74
+ > The encoder is bundled with the classification model and may be frozen by default (recommended).
75
+ > See the GitHub repository for training and fine-tuning scripts.
76
 
77
+ ---
78
 
79
+ ## Quickstart (PyTorch)
 
 
80
 
81
  ### Load the Pretraining Model (SSL Encoder)
82
 
83
+ ```python
84
  import torch
85
  from transformers import AutoModel
86
 
87
  model = AutoModel.from_pretrained(
88
  "SaeedLab/NeuroCLR",
 
89
  trust_remote_code=True
90
  )
91
 
92
  model.eval()
93
 
94
+ x = torch.randn(4, 1, 128) # [batch, 1, time_points]
95
 
96
  with torch.no_grad():
97
  outputs = model(x)
98
 
99
  print(outputs["h"].shape) # [4, 128]
100
+ print(outputs["z"].shape)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  ```
102
 
 
103
  ### Load the Downstream Classification Model
104
  ```py
105
  import torch