lkid7715 commited on
Commit
d134987
·
verified ·
1 Parent(s): 1fa9914

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +244 -3
README.md CHANGED
@@ -6,8 +6,249 @@ base_model:
6
  - YukunZhou/RETFound_mae_natureCFP
7
  - UFNLP/gatortronS
8
  tags:
9
- - medical
10
- - fundus
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # REVEAL: Multimodal VisionLanguage Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - YukunZhou/RETFound_mae_natureCFP
7
  - UFNLP/gatortronS
8
  tags:
9
+ - medical-imaging
10
+ - ophthalmology
11
+ - vision-language-model
12
+ - multimodal-learning
13
+ - alzheimers-disease
14
+ - dementia
15
+ - retinal-imaging
16
+ datasets:
17
+ - uk-biobank
18
+ model-index:
19
+ - name: REVEAL
20
+ results:
21
+ - task:
22
+ type: binary-classification
23
+ name: Incident Alzheimer's Disease Prediction (within ~8.5 years)
24
+ metrics:
25
+ - type: AUROC
26
+ value: 0.658
27
+ - task:
28
+ type: binary-classification
29
+ name: Incident Dementia Prediction (within ~8.5 years)
30
+ metrics:
31
+ - type: AUROC
32
+ value: 0.659
33
  ---
34
 
35
+ # REVEAL: Retinal-risk Vision-Language Early Alzheimer’s Learning
36
+
37
+ ## Model Description
38
+
39
+ REVEAL is a multimodal vision-language model designed to align retinal fundus imaging with individualized clinical risk factors for early prediction of Alzheimer’s disease (AD) and dementia. The model learns joint representations from retinal morphology and structured health data transformed into clinical narratives.
40
+
41
+ REVEAL leverages pretrained medical foundation models and introduces a group-aware contrastive learning (GACL) strategy to capture clinically meaningful multimodal relationships. The model is designed to support early disease risk stratification and multimodal biomarker discovery.
42
+
43
+ ---
44
+
45
+ ## Model Architecture
46
+
47
+ REVEAL is composed of:
48
+
49
+ - **Image Encoder:** RETFound retinal imaging foundation model
50
+ - **Text Encoder:** GatorTron clinical language model
51
+ - **Projection Layers:** Trainable modules mapping image and text embeddings into a shared latent space
52
+ - **Contrastive Learning Module:** Group-aware contrastive learning for multimodal alignment
53
+
54
+ The framework operates in two stages:
55
+
56
+ 1. Multimodal representation learning using contrastive vision-language alignment
57
+ 2. Downstream risk prediction using multimodal embeddings
58
+
59
+ ---
60
+
61
+ ## Training Data
62
+
63
+ ### Dataset Source
64
+
65
+ The model was trained using multimodal data derived from the UK Biobank (https://www.ukbiobank.ac.uk/), a large population-scale biomedical dataset containing retinal imaging and clinical health variables.
66
+
67
+ ### Cohort Composition
68
+
69
+ The dataset includes color fundus photographs and clinical risk factor data from 39,242 participants:
70
+
71
+ - Training set: 30,462 participants
72
+ - Validation set: 3,384 participants
73
+ - Test set: 5,396 participants
74
+
75
+ Training and validation sets contained only cognitively normal participants at baseline. Individuals who developed incident AD or dementia were reserved for downstream evaluation.
76
+
77
+ ---
78
+
79
+ ### Imaging Data
80
+
81
+ - Imaging modality: Color fundus photography
82
+ - Initial dataset: 136,994 retinal images
83
+ - Quality-controlled dataset: 66,251 images
84
+
85
+ Retinal morphometric features were extracted using the AutoMorph pipeline, including:
86
+
87
+ - Optic nerve head measurements (cup-to-disc ratios)
88
+ - Vascular morphology metrics
89
+ - Vessel tortuosity and fractal measurements
90
+
91
+ ---
92
+
93
+ ### Clinical Risk Factors
94
+
95
+ Risk factors include:
96
+
97
+ #### Demographic
98
+ - Age
99
+ - Sex
100
+ - Socioeconomic status
101
+ - Ethnicity
102
+ - Employment status
103
+
104
+ #### General Health
105
+ - BMI
106
+ - HbA1C
107
+ - Blood pressure
108
+ - Cognitive test scores
109
+
110
+ #### Behavioral and Psychiatric
111
+ - Depression
112
+ - Sleep deprivation
113
+ - Smoking history
114
+ - Alcohol use
115
+ - Cannabis use
116
+
117
+ #### Lifestyle and Social
118
+ - Physical activity
119
+ - Social engagement
120
+ - Leisure activity
121
+
122
+ #### Diet
123
+ - Food intake patterns
124
+ - Beverage consumption
125
+ - Nutritional indicators
126
+
127
+ ---
128
+
129
+ ### Synthetic Clinical Text Generation
130
+
131
+ Structured clinical variables were converted into standardized clinical narratives using a large language model. Each participant’s risk factors were mapped into a predefined clinical template to enable compatibility with vision-language training.
132
+
133
+ ---
134
+
135
+ ## Training Procedure
136
+
137
+ ### Multimodal Representation Learning
138
+
139
+ REVEAL aligns fundus images and clinical narratives using contrastive vision-language learning. Both modalities are encoded and projected into a shared latent embedding space.
140
+
141
+ ---
142
+
143
+ ### Group-Aware Contrastive Learning (GACL)
144
+
145
+ REVEAL introduces a group-aware pairing strategy that:
146
+
147
+ - Identifies subjects with similar retinal morphology
148
+ - Identifies subjects with similar clinical risk profiles
149
+ - Forms positive training pairs across similar individuals
150
+
151
+ This enables the model to learn clinically meaningful multimodal relationships rather than relying only on subject-level pairings.
152
+
153
+ ---
154
+
155
+ ### Loss Function
156
+
157
+ REVEAL uses a modified contrastive loss supporting multiple positive pairs per sample. Similarity is computed using cosine similarity between image and text embeddings.
158
+
159
+ ---
160
+
161
+ ### Hyperparameters
162
+
163
+ - Projection dimension: 1024
164
+ - Batch size: 128
165
+ - Learning rate: 2.42e-4
166
+ - Weight decay: 0.0232
167
+ - Temperature parameter: 0.07
168
+
169
+ Hyperparameters were optimized using Optuna (https://optuna.org/).
170
+
171
+ ---
172
+
173
+ ## Intended Use
174
+
175
+ ### Primary Use Cases
176
+
177
+ REVEAL is intended for research applications, including:
178
+
179
+ - Early risk stratification for Alzheimer’s disease and dementia
180
+ - Multimodal biomarker discovery
181
+ - Development of non-invasive screening strategies
182
+ - Population-level disease risk modeling
183
+ - Multimodal clinical representation learning
184
+
185
+ ---
186
+
187
+ ### Appropriate Use
188
+
189
+ The model should be used:
190
+
191
+ - For research or exploratory clinical modeling
192
+ - With appropriate ethical and institutional review
193
+ - With external validation before use in new populations
194
+
195
+ ---
196
+
197
+ ### Out-of-Scope Use
198
+
199
+ The model is **not intended** for:
200
+
201
+ - Direct clinical diagnosis
202
+ - Medical decision-making without clinician oversight
203
+ - Deployment as a medical device
204
+ - Use in unvalidated populations
205
+
206
+ ---
207
+
208
+ ## Evaluation
209
+
210
+ REVEAL embeddings were evaluated using downstream support vector machine classifiers.
211
+
212
+ ### Incident Alzheimer’s Disease Prediction
213
+ - AUROC: 0.658
214
+ - Balanced Accuracy: 0.610
215
+
216
+ ### Incident Dementia Prediction
217
+ - AUROC: 0.659
218
+ - Balanced Accuracy: 0.605
219
+
220
+ Performance reflects average results across multiple random seeds.
221
+
222
+ ---
223
+
224
+ ## Limitations
225
+
226
+ - Model training is limited to the UK Biobank cohort
227
+ - Performance is sensitive to similarity threshold selection
228
+ - Incident AD and dementia cases remain relatively limited
229
+ - Synthetic clinical narrative generation may introduce bias
230
+ - Generalizability to other populations requires external validation
231
+
232
+ ---
233
+
234
+ ## Ethical Considerations
235
+
236
+ - Retinal images and clinical variables contain sensitive health data
237
+ - Predictions may influence disease risk interpretation
238
+ - Model outputs should not replace clinical judgment
239
+ - Use requires adherence to privacy, regulatory, and ethical guidelines
240
+
241
+ ---
242
+
243
+ ## Citation
244
+
245
+ If you use this model, please cite:
246
+
247
+ @article{leem2026reveal,
248
+ title={REVEAL: Multimodal Vision-Language Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction},
249
+ author={Leem, Seowung and Gu, Lin and You, Chenyu and Gong, Kuang and Fang, Ruogu},
250
+ journal={MIDL 2026 (Under Review)},
251
+ year={2026}
252
+ }
253
+
254
+