Diffusers
ophthalmology
OCT
fundus
medical-imaging
diffusion
stable-diffusion-3
segmentation
instruction-tuning
Instructions to use MaybeRichard/OCTFlow with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use MaybeRichard/OCTFlow with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("MaybeRichard/OCTFlow", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
OCTFlow v1 (2026-06-21): 13 weights (core+downstream) + S0-S7 report + experiment log; public release
941b719 verified metadata
license: other
tags:
- ophthalmology
- OCT
- fundus
- medical-imaging
- diffusion
- stable-diffusion-3
- segmentation
- instruction-tuning
library_name: diffusers
OCTFlow v1 — Unified Generative Instruction Foundation Model for Ophthalmic Imaging
Frozen v1 · 2026-06-21. A single SD3-medium (KL-VAE 16ch + 2B MMDiT, rectified flow) instruction-tuned
(Vision-Banana style) ophthalmic foundation model: one model, change the text prompt to do generation,
multi-scheme segmentation, denoising — across 9 imaging modalities. hf download MaybeRichard/OCTFlow --revision v1.
What it does (one base, switch by prompt)
- Generation (T2I) — 9 modalities: OCT B-scan, color fundus, SLO, UWF fundus, OCTA en-face, OCT en-face, FA, slit-lamp, IR-SLO.
- Instruction segmentation — OCT retinal layers (9/5/3 + arbitrary unseen counts + single-layer selection + any colors), OCT fluid (IRF/SRF/PED), OCTA FAZ & large vessels, fundus vessels & disc/cup, OCTA500 5-layer.
- Denoising — OCT speckle (generative i2i).
- Disease classification — frozen-feature linear probe (14 OCT/fundus/UWF tasks).
- Semantic-synthesis & data augmentation — mask→image generation; labeled synthetic data (shareable/privacy).
Honest results (see results/octflow_downstream_report.html + EXPERIMENT_LOG.md)
Positioning is competitive-not-SOTA per task; the genuine differentiation is unification + zero-shot instruction generalization + shareable data.
- Classification (14 tasks, linear probe, mean top1): DINOv2 0.849 ≥ RETFound 0.844 ≥ ours 0.837 > DINOv3 0.824 > MIRAGE 0.808 > VisionFM 0.796. Ours 3rd/7, never per-task #1.
- Segmentation (vs end-to-end fine-tuned FM @512, fair): ours competitive; loses most tasks to the strongest fine-tuned FM (esp. DINOv2) by 0.03–0.09, wins OCTA large-vessel, ties FIVES vessel.
- Denoising: fine-tuned FM ≥ ours on PSNR/SSIM; ours wins perceptual LPIPS (0.135 vs 0.37–0.41).
- Data augmentation (classification §5 / segmentation §6): synthetic ≈ or < classical augmentation for raw point-gain; value is in shareable/customizable data, not point-gain.
- ★ Zero-shot instruction generalization (the moat): trained only on 9/5/3-layer schemes; on unseen layer counts (8/7/6/4/2) mean mIoU 0.441 ≈ seen 0.430; cross-device zero-shot (OCTA500, never trained) binary retina IoU 0.897. A fixed-head discriminative FM / U-Net cannot do this.
Weights (weights/, optimizer state stripped, bf16)
| file | role |
|---|---|
sd3_multimodal_base_v2_step240000.pt |
T2I base (generation + probe backbone) |
sd3_oct_stageA_v3_step20000.pt |
OCT domain-adapt init (warm-start) |
sd3_vb_stageC_v3a_step30000.pt |
instruction model (multi-scheme seg + zero-shot, §7) |
sd3_vb_layer_100k_step30000.pt |
9-layer seg anchor (v3a recipe) |
sd3_vb_layer_100k_v3b_step15000.pt |
9-layer seg + decoded-loss (sharper thin layers) |
sd3_vb_mask2img_step12000.pt |
mask→image generator (§6 semantic synthesis) |
sd3_vb_denoise_100k_step4000.pt |
OCT denoiser |
sd3_vb_{disccup,faz,fluid,octavessel,vessel}_v3b_step4000.pt |
downstream seg specialists |
sd3_vb_octa500_100k_step4000.pt |
OCTA500 5-layer specialist |
Usage
hf download MaybeRichard/OCTFlow --revision v1 --local-dir octflow_v1- Untar
octflow-raev2-code.tar.gz; install viauv sync(PyTorch 2.10+cu128).torch.load(..., weights_only=False). - Put a weight at
pilot/path1/results/<run>/checkpoints/, set dataset paths, run the matching script inpilot/path1/scripts/foundation/(e.g.seg_instr_eval.py,zeroshot_spectrum.py,denoise_eval.py).
Notes / limitations
- Pilot-scale; not clinically validated (external multi-center validation + reader study are future work).
- Built on gated
stabilityai/stable-diffusion-3-medium-diffusers(accept its license). - Env: base conda, torch 2.10.0+cu128, diffusers 0.37, transformers 5.3.