Diffusers
ophthalmology
OCT
fundus
medical-imaging
diffusion
stable-diffusion-3
segmentation
instruction-tuning
Instructions to use MaybeRichard/OCTFlow with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use MaybeRichard/OCTFlow with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("MaybeRichard/OCTFlow", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
OCTFlow v1 (2026-06-21): 13 weights (core+downstream) + S0-S7 report + experiment log; public release
941b719 verified | license: other | |
| tags: | |
| - ophthalmology | |
| - OCT | |
| - fundus | |
| - medical-imaging | |
| - diffusion | |
| - stable-diffusion-3 | |
| - segmentation | |
| - instruction-tuning | |
| library_name: diffusers | |
| # OCTFlow v1 — Unified Generative Instruction Foundation Model for Ophthalmic Imaging | |
| **Frozen v1 · 2026-06-21.** A single SD3-medium (KL-VAE 16ch + 2B MMDiT, rectified flow) instruction-tuned | |
| (Vision-Banana style) ophthalmic foundation model: one model, change the **text prompt** to do generation, | |
| multi-scheme segmentation, denoising — across 9 imaging modalities. `hf download MaybeRichard/OCTFlow --revision v1`. | |
| ## What it does (one base, switch by prompt) | |
| - **Generation (T2I)** — 9 modalities: OCT B-scan, color fundus, SLO, UWF fundus, OCTA en-face, OCT en-face, FA, slit-lamp, IR-SLO. | |
| - **Instruction segmentation** — OCT retinal layers (9/5/3 + **arbitrary unseen counts** + single-layer selection + any colors), OCT fluid (IRF/SRF/PED), OCTA FAZ & large vessels, fundus vessels & disc/cup, OCTA500 5-layer. | |
| - **Denoising** — OCT speckle (generative i2i). | |
| - **Disease classification** — frozen-feature linear probe (14 OCT/fundus/UWF tasks). | |
| - **Semantic-synthesis & data augmentation** — mask→image generation; labeled synthetic data (shareable/privacy). | |
| ## Honest results (see `results/octflow_downstream_report.html` + `EXPERIMENT_LOG.md`) | |
| Positioning is **competitive-not-SOTA per task**; the genuine differentiation is unification + zero-shot instruction generalization + shareable data. | |
| - **Classification (14 tasks, linear probe, mean top1)**: DINOv2 0.849 ≥ RETFound 0.844 ≥ **ours 0.837** > DINOv3 0.824 > MIRAGE 0.808 > VisionFM 0.796. Ours 3rd/7, never per-task #1. | |
| - **Segmentation (vs end-to-end fine-tuned FM @512, fair)**: ours competitive; loses most tasks to the strongest fine-tuned FM (esp. DINOv2) by 0.03–0.09, wins OCTA large-vessel, ties FIVES vessel. | |
| - **Denoising**: fine-tuned FM ≥ ours on PSNR/SSIM; ours wins perceptual LPIPS (0.135 vs 0.37–0.41). | |
| - **Data augmentation (classification §5 / segmentation §6)**: synthetic ≈ or < classical augmentation for raw point-gain; value is in shareable/customizable data, not point-gain. | |
| - **★ Zero-shot instruction generalization (the moat)**: trained only on 9/5/3-layer schemes; on **unseen** layer counts (8/7/6/4/2) mean mIoU **0.441 ≈ seen 0.430**; cross-device zero-shot (OCTA500, never trained) binary retina IoU **0.897**. A fixed-head discriminative FM / U-Net cannot do this. | |
| ## Weights (`weights/`, optimizer state stripped, bf16) | |
| | file | role | | |
| |---|---| | |
| | `sd3_multimodal_base_v2_step240000.pt` | T2I base (generation + probe backbone) | | |
| | `sd3_oct_stageA_v3_step20000.pt` | OCT domain-adapt init (warm-start) | | |
| | `sd3_vb_stageC_v3a_step30000.pt` | **instruction model** (multi-scheme seg + zero-shot, §7) | | |
| | `sd3_vb_layer_100k_step30000.pt` | 9-layer seg anchor (v3a recipe) | | |
| | `sd3_vb_layer_100k_v3b_step15000.pt` | 9-layer seg + decoded-loss (sharper thin layers) | | |
| | `sd3_vb_mask2img_step12000.pt` | mask→image generator (§6 semantic synthesis) | | |
| | `sd3_vb_denoise_100k_step4000.pt` | OCT denoiser | | |
| | `sd3_vb_{disccup,faz,fluid,octavessel,vessel}_v3b_step4000.pt` | downstream seg specialists | | |
| | `sd3_vb_octa500_100k_step4000.pt` | OCTA500 5-layer specialist | | |
| ## Usage | |
| 1. `hf download MaybeRichard/OCTFlow --revision v1 --local-dir octflow_v1` | |
| 2. Untar `octflow-raev2-code.tar.gz`; install via `uv sync` (PyTorch 2.10+cu128). **`torch.load(..., weights_only=False)`**. | |
| 3. Put a weight at `pilot/path1/results/<run>/checkpoints/`, set dataset paths, run the matching script in `pilot/path1/scripts/foundation/` (e.g. `seg_instr_eval.py`, `zeroshot_spectrum.py`, `denoise_eval.py`). | |
| ## Notes / limitations | |
| - Pilot-scale; **not clinically validated** (external multi-center validation + reader study are future work). | |
| - Built on gated `stabilityai/stable-diffusion-3-medium-diffusers` (accept its license). | |
| - Env: base conda, torch 2.10.0+cu128, diffusers 0.37, transformers 5.3. | |