--- license: other tags: - ophthalmology - OCT - fundus - medical-imaging - diffusion - stable-diffusion-3 - segmentation - instruction-tuning library_name: diffusers --- # OCTFlow v1 — Unified Generative Instruction Foundation Model for Ophthalmic Imaging **Frozen v1 · 2026-06-21.** A single SD3-medium (KL-VAE 16ch + 2B MMDiT, rectified flow) instruction-tuned (Vision-Banana style) ophthalmic foundation model: one model, change the **text prompt** to do generation, multi-scheme segmentation, denoising — across 9 imaging modalities. `hf download MaybeRichard/OCTFlow --revision v1`. ## What it does (one base, switch by prompt) - **Generation (T2I)** — 9 modalities: OCT B-scan, color fundus, SLO, UWF fundus, OCTA en-face, OCT en-face, FA, slit-lamp, IR-SLO. - **Instruction segmentation** — OCT retinal layers (9/5/3 + **arbitrary unseen counts** + single-layer selection + any colors), OCT fluid (IRF/SRF/PED), OCTA FAZ & large vessels, fundus vessels & disc/cup, OCTA500 5-layer. - **Denoising** — OCT speckle (generative i2i). - **Disease classification** — frozen-feature linear probe (14 OCT/fundus/UWF tasks). - **Semantic-synthesis & data augmentation** — mask→image generation; labeled synthetic data (shareable/privacy). ## Honest results (see `results/octflow_downstream_report.html` + `EXPERIMENT_LOG.md`) Positioning is **competitive-not-SOTA per task**; the genuine differentiation is unification + zero-shot instruction generalization + shareable data. - **Classification (14 tasks, linear probe, mean top1)**: DINOv2 0.849 ≥ RETFound 0.844 ≥ **ours 0.837** > DINOv3 0.824 > MIRAGE 0.808 > VisionFM 0.796. Ours 3rd/7, never per-task #1. - **Segmentation (vs end-to-end fine-tuned FM @512, fair)**: ours competitive; loses most tasks to the strongest fine-tuned FM (esp. DINOv2) by 0.03–0.09, wins OCTA large-vessel, ties FIVES vessel. - **Denoising**: fine-tuned FM ≥ ours on PSNR/SSIM; ours wins perceptual LPIPS (0.135 vs 0.37–0.41). - **Data augmentation (classification §5 / segmentation §6)**: synthetic ≈ or < classical augmentation for raw point-gain; value is in shareable/customizable data, not point-gain. - **★ Zero-shot instruction generalization (the moat)**: trained only on 9/5/3-layer schemes; on **unseen** layer counts (8/7/6/4/2) mean mIoU **0.441 ≈ seen 0.430**; cross-device zero-shot (OCTA500, never trained) binary retina IoU **0.897**. A fixed-head discriminative FM / U-Net cannot do this. ## Weights (`weights/`, optimizer state stripped, bf16) | file | role | |---|---| | `sd3_multimodal_base_v2_step240000.pt` | T2I base (generation + probe backbone) | | `sd3_oct_stageA_v3_step20000.pt` | OCT domain-adapt init (warm-start) | | `sd3_vb_stageC_v3a_step30000.pt` | **instruction model** (multi-scheme seg + zero-shot, §7) | | `sd3_vb_layer_100k_step30000.pt` | 9-layer seg anchor (v3a recipe) | | `sd3_vb_layer_100k_v3b_step15000.pt` | 9-layer seg + decoded-loss (sharper thin layers) | | `sd3_vb_mask2img_step12000.pt` | mask→image generator (§6 semantic synthesis) | | `sd3_vb_denoise_100k_step4000.pt` | OCT denoiser | | `sd3_vb_{disccup,faz,fluid,octavessel,vessel}_v3b_step4000.pt` | downstream seg specialists | | `sd3_vb_octa500_100k_step4000.pt` | OCTA500 5-layer specialist | ## Usage 1. `hf download MaybeRichard/OCTFlow --revision v1 --local-dir octflow_v1` 2. Untar `octflow-raev2-code.tar.gz`; install via `uv sync` (PyTorch 2.10+cu128). **`torch.load(..., weights_only=False)`**. 3. Put a weight at `pilot/path1/results//checkpoints/`, set dataset paths, run the matching script in `pilot/path1/scripts/foundation/` (e.g. `seg_instr_eval.py`, `zeroshot_spectrum.py`, `denoise_eval.py`). ## Notes / limitations - Pilot-scale; **not clinically validated** (external multi-center validation + reader study are future work). - Built on gated `stabilityai/stable-diffusion-3-medium-diffusers` (accept its license). - Env: base conda, torch 2.10.0+cu128, diffusers 0.37, transformers 5.3.