OCTFlow / README.md

OCTFlow v1 (2026-06-21): 13 weights (core+downstream) + S0-S7 report + experiment log; public release

941b719 verified 7 days ago

4.01 kB

	---
	license: other
	tags:
	- ophthalmology
	- OCT
	- fundus
	- medical-imaging
	- diffusion
	- stable-diffusion-3
	- segmentation
	- instruction-tuning
	library_name: diffusers
	---

	# OCTFlow v1 — Unified Generative Instruction Foundation Model for Ophthalmic Imaging

	Frozen v1 · 2026-06-21. A single SD3-medium (KL-VAE 16ch + 2B MMDiT, rectified flow) instruction-tuned
	(Vision-Banana style) ophthalmic foundation model: one model, change the text prompt to do generation,
	multi-scheme segmentation, denoising — across 9 imaging modalities. `hf download MaybeRichard/OCTFlow --revision v1`.

	## What it does (one base, switch by prompt)
	- Generation (T2I) — 9 modalities: OCT B-scan, color fundus, SLO, UWF fundus, OCTA en-face, OCT en-face, FA, slit-lamp, IR-SLO.
	- Instruction segmentation — OCT retinal layers (9/5/3 + arbitrary unseen counts + single-layer selection + any colors), OCT fluid (IRF/SRF/PED), OCTA FAZ & large vessels, fundus vessels & disc/cup, OCTA500 5-layer.
	- Denoising — OCT speckle (generative i2i).
	- Disease classification — frozen-feature linear probe (14 OCT/fundus/UWF tasks).
	- Semantic-synthesis & data augmentation — mask→image generation; labeled synthetic data (shareable/privacy).

	## Honest results (see `results/octflow_downstream_report.html` + `EXPERIMENT_LOG.md`)
	Positioning is competitive-not-SOTA per task; the genuine differentiation is unification + zero-shot instruction generalization + shareable data.
	- Classification (14 tasks, linear probe, mean top1): DINOv2 0.849 ≥ RETFound 0.844 ≥ ours 0.837 > DINOv3 0.824 > MIRAGE 0.808 > VisionFM 0.796. Ours 3rd/7, never per-task #1.
	- Segmentation (vs end-to-end fine-tuned FM @512, fair): ours competitive; loses most tasks to the strongest fine-tuned FM (esp. DINOv2) by 0.03–0.09, wins OCTA large-vessel, ties FIVES vessel.
	- Denoising: fine-tuned FM ≥ ours on PSNR/SSIM; ours wins perceptual LPIPS (0.135 vs 0.37–0.41).
	- Data augmentation (classification §5 / segmentation §6): synthetic ≈ or < classical augmentation for raw point-gain; value is in shareable/customizable data, not point-gain.
	- ★ Zero-shot instruction generalization (the moat): trained only on 9/5/3-layer schemes; on unseen layer counts (8/7/6/4/2) mean mIoU 0.441 ≈ seen 0.430; cross-device zero-shot (OCTA500, never trained) binary retina IoU 0.897. A fixed-head discriminative FM / U-Net cannot do this.

	## Weights (`weights/`, optimizer state stripped, bf16)
	\| file \| role \|
	\|---\|---\|
	\| `sd3_multimodal_base_v2_step240000.pt` \| T2I base (generation + probe backbone) \|
	\| `sd3_oct_stageA_v3_step20000.pt` \| OCT domain-adapt init (warm-start) \|
	\| `sd3_vb_stageC_v3a_step30000.pt` \| instruction model (multi-scheme seg + zero-shot, §7) \|
	\| `sd3_vb_layer_100k_step30000.pt` \| 9-layer seg anchor (v3a recipe) \|
	\| `sd3_vb_layer_100k_v3b_step15000.pt` \| 9-layer seg + decoded-loss (sharper thin layers) \|
	\| `sd3_vb_mask2img_step12000.pt` \| mask→image generator (§6 semantic synthesis) \|
	\| `sd3_vb_denoise_100k_step4000.pt` \| OCT denoiser \|
	\| `sd3_vb_{disccup,faz,fluid,octavessel,vessel}_v3b_step4000.pt` \| downstream seg specialists \|
	\| `sd3_vb_octa500_100k_step4000.pt` \| OCTA500 5-layer specialist \|

	## Usage
	1. `hf download MaybeRichard/OCTFlow --revision v1 --local-dir octflow_v1`
	2. Untar `octflow-raev2-code.tar.gz`; install via `uv sync` (PyTorch 2.10+cu128). `torch.load(..., weights_only=False)`.
	3. Put a weight at `pilot/path1/results/<run>/checkpoints/`, set dataset paths, run the matching script in `pilot/path1/scripts/foundation/` (e.g. `seg_instr_eval.py`, `zeroshot_spectrum.py`, `denoise_eval.py`).

	## Notes / limitations
	- Pilot-scale; not clinically validated (external multi-center validation + reader study are future work).
	- Built on gated `stabilityai/stable-diffusion-3-medium-diffusers` (accept its license).
	- Env: base conda, torch 2.10.0+cu128, diffusers 0.37, transformers 5.3.

	---
	license: other
	tags:
	- ophthalmology
	- OCT
	- fundus
	- medical-imaging
	- diffusion
	- stable-diffusion-3
	- segmentation
	- instruction-tuning
	library_name: diffusers
	---

	# OCTFlow v1 — Unified Generative Instruction Foundation Model for Ophthalmic Imaging

	Frozen v1 · 2026-06-21. A single SD3-medium (KL-VAE 16ch + 2B MMDiT, rectified flow) instruction-tuned
	(Vision-Banana style) ophthalmic foundation model: one model, change the text prompt to do generation,
	multi-scheme segmentation, denoising — across 9 imaging modalities. `hf download MaybeRichard/OCTFlow --revision v1`.

	## What it does (one base, switch by prompt)
	- Generation (T2I) — 9 modalities: OCT B-scan, color fundus, SLO, UWF fundus, OCTA en-face, OCT en-face, FA, slit-lamp, IR-SLO.
	- Instruction segmentation — OCT retinal layers (9/5/3 + arbitrary unseen counts + single-layer selection + any colors), OCT fluid (IRF/SRF/PED), OCTA FAZ & large vessels, fundus vessels & disc/cup, OCTA500 5-layer.
	- Denoising — OCT speckle (generative i2i).
	- Disease classification — frozen-feature linear probe (14 OCT/fundus/UWF tasks).
	- Semantic-synthesis & data augmentation — mask→image generation; labeled synthetic data (shareable/privacy).

	## Honest results (see `results/octflow_downstream_report.html` + `EXPERIMENT_LOG.md`)
	Positioning is competitive-not-SOTA per task; the genuine differentiation is unification + zero-shot instruction generalization + shareable data.
	- Classification (14 tasks, linear probe, mean top1): DINOv2 0.849 ≥ RETFound 0.844 ≥ ours 0.837 > DINOv3 0.824 > MIRAGE 0.808 > VisionFM 0.796. Ours 3rd/7, never per-task #1.
	- Segmentation (vs end-to-end fine-tuned FM @512, fair): ours competitive; loses most tasks to the strongest fine-tuned FM (esp. DINOv2) by 0.03–0.09, wins OCTA large-vessel, ties FIVES vessel.
	- Denoising: fine-tuned FM ≥ ours on PSNR/SSIM; ours wins perceptual LPIPS (0.135 vs 0.37–0.41).
	- Data augmentation (classification §5 / segmentation §6): synthetic ≈ or < classical augmentation for raw point-gain; value is in shareable/customizable data, not point-gain.
	- ★ Zero-shot instruction generalization (the moat): trained only on 9/5/3-layer schemes; on unseen layer counts (8/7/6/4/2) mean mIoU 0.441 ≈ seen 0.430; cross-device zero-shot (OCTA500, never trained) binary retina IoU 0.897. A fixed-head discriminative FM / U-Net cannot do this.

	## Weights (`weights/`, optimizer state stripped, bf16)
	\| file \| role \|
	\|---\|---\|
	\| `sd3_multimodal_base_v2_step240000.pt` \| T2I base (generation + probe backbone) \|
	\| `sd3_oct_stageA_v3_step20000.pt` \| OCT domain-adapt init (warm-start) \|
	\| `sd3_vb_stageC_v3a_step30000.pt` \| instruction model (multi-scheme seg + zero-shot, §7) \|
	\| `sd3_vb_layer_100k_step30000.pt` \| 9-layer seg anchor (v3a recipe) \|
	\| `sd3_vb_layer_100k_v3b_step15000.pt` \| 9-layer seg + decoded-loss (sharper thin layers) \|
	\| `sd3_vb_mask2img_step12000.pt` \| mask→image generator (§6 semantic synthesis) \|
	\| `sd3_vb_denoise_100k_step4000.pt` \| OCT denoiser \|
	\| `sd3_vb_{disccup,faz,fluid,octavessel,vessel}_v3b_step4000.pt` \| downstream seg specialists \|
	\| `sd3_vb_octa500_100k_step4000.pt` \| OCTA500 5-layer specialist \|

	## Usage
	1. `hf download MaybeRichard/OCTFlow --revision v1 --local-dir octflow_v1`
	2. Untar `octflow-raev2-code.tar.gz`; install via `uv sync` (PyTorch 2.10+cu128). `torch.load(..., weights_only=False)`.
	3. Put a weight at `pilot/path1/results/<run>/checkpoints/`, set dataset paths, run the matching script in `pilot/path1/scripts/foundation/` (e.g. `seg_instr_eval.py`, `zeroshot_spectrum.py`, `denoise_eval.py`).

	## Notes / limitations
	- Pilot-scale; not clinically validated (external multi-center validation + reader study are future work).
	- Built on gated `stabilityai/stable-diffusion-3-medium-diffusers` (accept its license).
	- Env: base conda, torch 2.10.0+cu128, diffusers 0.37, transformers 5.3.