usama10 commited on
Commit
1e27037
·
verified ·
1 Parent(s): 19f94ee

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +76 -0
  2. diffusion_best.pt +3 -0
  3. vae_best.pt +3 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - diffusion
5
+ - retinal-fundus
6
+ - diabetic-retinopathy
7
+ - medical-imaging
8
+ - image-to-image
9
+ - conditional-generation
10
+ - stable-diffusion
11
+ datasets:
12
+ - usama10/retinal-dr-longitudinal
13
+ pipeline_tag: image-to-image
14
+ ---
15
+
16
+ # Conditional Latent Diffusion Model for Retinal Future-State Synthesis
17
+
18
+ Trained model weights for predicting two-year follow-up retinal fundus images from baseline photographs and clinical metadata.
19
+
20
+ ## Model Description
21
+
22
+ This model adapts Stable Diffusion 1.5 for longitudinal retinal image prediction. It consists of two components:
23
+
24
+ 1. **Fine-tuned VAE** (`vae_best.pt`, 320 MB): SD 1.5 VAE encoder/decoder fine-tuned on retinal fundus images with L1 + SSIM + LPIPS + KL loss. Achieves SSIM 0.954 on reconstruction.
25
+
26
+ 2. **Conditional U-Net** (`diffusion_best.pt`, 13 GB): 860M-parameter denoising U-Net with 15-channel input (4 noisy latent + 4 baseline latent + 7 clinical feature maps). Trained for 500 epochs with cosine LR schedule, EMA, and classifier-free guidance dropout.
27
+
28
+ ## Performance
29
+
30
+ | Metric | Value |
31
+ |--------|-------|
32
+ | SSIM | 0.762 |
33
+ | PSNR | 17.26 dB |
34
+ | LPIPS | 0.379 |
35
+ | FID | 107.28 |
36
+
37
+ Evaluated on 110 held-out test image pairs.
38
+
39
+ ## Usage
40
+
41
+ ```python
42
+ import torch
43
+ from diffusers import AutoencoderKL, UNet2DConditionModel
44
+
45
+ # Load VAE
46
+ vae = AutoencoderKL.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="vae")
47
+ vae_state = torch.load("vae_best.pt", map_location="cpu")
48
+ if "model_state_dict" in vae_state:
49
+ vae_state = vae_state["model_state_dict"]
50
+ vae.load_state_dict(vae_state, strict=False)
51
+
52
+ # Load U-Net (requires modified conv_in for 15 input channels)
53
+ unet = UNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="unet")
54
+ # ... modify conv_in and load checkpoint
55
+ # See full inference code at the GitHub repository
56
+ ```
57
+
58
+ ## Links
59
+
60
+ - **Code:** [github.com/Usama1002/retinal-diffusion](https://github.com/Usama1002/retinal-diffusion)
61
+ - **Dataset:** [huggingface.co/datasets/usama10/retinal-dr-longitudinal](https://huggingface.co/datasets/usama10/retinal-dr-longitudinal)
62
+
63
+ ## Citation
64
+
65
+ ```bibtex
66
+ @article{usama2026retinal,
67
+ title={Conditional Latent Diffusion for Predictive Retinal Fundus Image Synthesis from Baseline Imaging and Clinical Metadata},
68
+ author={Usama, Muhammad and Pazo, Emmanuel Eric and Li, Xiaorong and Liu, Juping},
69
+ journal={Computers in Biology and Medicine (under review)},
70
+ year={2026}
71
+ }
72
+ ```
73
+
74
+ ## License
75
+
76
+ CC BY-NC 4.0. Non-commercial research use only.
diffusion_best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:954df800e580074b38d1bae261eb3c5600899c0257d9c47f66e8f2ba01721ece
3
+ size 13753967793
vae_best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:238a627bae7ca14a7db7053b5dca1d9ce07e9af6ac9160b0a329bc120aabdc19
3
+ size 334695957