OCTFlow / README.md
MaybeRichard's picture
OCTFlow v1 (2026-06-21): 13 weights (core+downstream) + S0-S7 report + experiment log; public release
941b719 verified
|
Raw
History Blame Contribute Delete
4.01 kB
metadata
license: other
tags:
  - ophthalmology
  - OCT
  - fundus
  - medical-imaging
  - diffusion
  - stable-diffusion-3
  - segmentation
  - instruction-tuning
library_name: diffusers

OCTFlow v1 — Unified Generative Instruction Foundation Model for Ophthalmic Imaging

Frozen v1 · 2026-06-21. A single SD3-medium (KL-VAE 16ch + 2B MMDiT, rectified flow) instruction-tuned (Vision-Banana style) ophthalmic foundation model: one model, change the text prompt to do generation, multi-scheme segmentation, denoising — across 9 imaging modalities. hf download MaybeRichard/OCTFlow --revision v1.

What it does (one base, switch by prompt)

  • Generation (T2I) — 9 modalities: OCT B-scan, color fundus, SLO, UWF fundus, OCTA en-face, OCT en-face, FA, slit-lamp, IR-SLO.
  • Instruction segmentation — OCT retinal layers (9/5/3 + arbitrary unseen counts + single-layer selection + any colors), OCT fluid (IRF/SRF/PED), OCTA FAZ & large vessels, fundus vessels & disc/cup, OCTA500 5-layer.
  • Denoising — OCT speckle (generative i2i).
  • Disease classification — frozen-feature linear probe (14 OCT/fundus/UWF tasks).
  • Semantic-synthesis & data augmentation — mask→image generation; labeled synthetic data (shareable/privacy).

Honest results (see results/octflow_downstream_report.html + EXPERIMENT_LOG.md)

Positioning is competitive-not-SOTA per task; the genuine differentiation is unification + zero-shot instruction generalization + shareable data.

  • Classification (14 tasks, linear probe, mean top1): DINOv2 0.849 ≥ RETFound 0.844 ≥ ours 0.837 > DINOv3 0.824 > MIRAGE 0.808 > VisionFM 0.796. Ours 3rd/7, never per-task #1.
  • Segmentation (vs end-to-end fine-tuned FM @512, fair): ours competitive; loses most tasks to the strongest fine-tuned FM (esp. DINOv2) by 0.03–0.09, wins OCTA large-vessel, ties FIVES vessel.
  • Denoising: fine-tuned FM ≥ ours on PSNR/SSIM; ours wins perceptual LPIPS (0.135 vs 0.37–0.41).
  • Data augmentation (classification §5 / segmentation §6): synthetic ≈ or < classical augmentation for raw point-gain; value is in shareable/customizable data, not point-gain.
  • ★ Zero-shot instruction generalization (the moat): trained only on 9/5/3-layer schemes; on unseen layer counts (8/7/6/4/2) mean mIoU 0.441 ≈ seen 0.430; cross-device zero-shot (OCTA500, never trained) binary retina IoU 0.897. A fixed-head discriminative FM / U-Net cannot do this.

Weights (weights/, optimizer state stripped, bf16)

file role
sd3_multimodal_base_v2_step240000.pt T2I base (generation + probe backbone)
sd3_oct_stageA_v3_step20000.pt OCT domain-adapt init (warm-start)
sd3_vb_stageC_v3a_step30000.pt instruction model (multi-scheme seg + zero-shot, §7)
sd3_vb_layer_100k_step30000.pt 9-layer seg anchor (v3a recipe)
sd3_vb_layer_100k_v3b_step15000.pt 9-layer seg + decoded-loss (sharper thin layers)
sd3_vb_mask2img_step12000.pt mask→image generator (§6 semantic synthesis)
sd3_vb_denoise_100k_step4000.pt OCT denoiser
sd3_vb_{disccup,faz,fluid,octavessel,vessel}_v3b_step4000.pt downstream seg specialists
sd3_vb_octa500_100k_step4000.pt OCTA500 5-layer specialist

Usage

  1. hf download MaybeRichard/OCTFlow --revision v1 --local-dir octflow_v1
  2. Untar octflow-raev2-code.tar.gz; install via uv sync (PyTorch 2.10+cu128). torch.load(..., weights_only=False).
  3. Put a weight at pilot/path1/results/<run>/checkpoints/, set dataset paths, run the matching script in pilot/path1/scripts/foundation/ (e.g. seg_instr_eval.py, zeroshot_spectrum.py, denoise_eval.py).

Notes / limitations

  • Pilot-scale; not clinically validated (external multi-center validation + reader study are future work).
  • Built on gated stabilityai/stable-diffusion-3-medium-diffusers (accept its license).
  • Env: base conda, torch 2.10.0+cu128, diffusers 0.37, transformers 5.3.