Diffusers
ophthalmology
OCT
fundus
medical-imaging
diffusion
stable-diffusion-3
segmentation
instruction-tuning
Instructions to use MaybeRichard/OCTFlow with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use MaybeRichard/OCTFlow with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("MaybeRichard/OCTFlow", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
OCTFlow Path-1 code + stripped weights (Stage A* + v3a + v1/v2)
Browse files
README.md
ADDED
|
@@ -0,0 +1,70 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
tags:
|
| 4 |
+
- oct
|
| 5 |
+
- ophthalmology
|
| 6 |
+
- segmentation
|
| 7 |
+
- stable-diffusion-3
|
| 8 |
+
- instruction-tuning
|
| 9 |
+
- medical-imaging
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# OCTFlow β Path 1 (SD3 backbone) code + weights
|
| 13 |
+
|
| 14 |
+
Reusable code and checkpoints for the OCTFlow pilot: an ophthalmic multimodal
|
| 15 |
+
generative model that does **prompt-controlled OCT retinal-layer segmentation**
|
| 16 |
+
(Vision-Banana-style instruction tuning on a Stable Diffusion 3 medium backbone).
|
| 17 |
+
|
| 18 |
+
This repo is for **continuing the work on a new machine** β the dataset is hosted
|
| 19 |
+
separately. Optimizer state has been stripped from the checkpoints (warm-start and
|
| 20 |
+
inference only need `model` weights).
|
| 21 |
+
|
| 22 |
+
## Contents
|
| 23 |
+
|
| 24 |
+
| File | What |
|
| 25 |
+
|---|---|
|
| 26 |
+
| `octflow-raev2-code.tar.gz` | Full RAEv2 working tree (src/ engine + pilot/path1/ Path-1 code, configs, scripts). Excludes results/, .git/, pretrained_models/, data/. |
|
| 27 |
+
| `weights/sd3_oct_stageA_v3_step20000.pt` | **Stage A\*** β SD3 medium fine-tuned on Topcon OCT (T2I domain adaptation). The warm-start base for all Stage C runs. |
|
| 28 |
+
| `weights/sd3_vb_stageC_v3a_step30000.pt` | **v3a (best)** β multi-prompt instruction tuning. Follows prompts for 9/5/3-layer + arbitrary colors + single-layer selection; zero-shot adapts to new layer schemes. |
|
| 29 |
+
| `weights/sd3_vb_stageC_v1_step20000.pt` | (optional) v1 specialist, prob_seg=0.3, single fixed 10-color prompt. |
|
| 30 |
+
| `weights/sd3_vb_stageC_v2_step20000.pt` | (optional) v2 specialist, prob_seg=0.5. |
|
| 31 |
+
|
| 32 |
+
Each `.pt` holds `{step, model, ema, config}` (no optimizer). `model` is a
|
| 33 |
+
`SD3Transformer2DModel` with `pos_embed.proj` expanded 16β32 input channels
|
| 34 |
+
(channel-concat image conditioning).
|
| 35 |
+
|
| 36 |
+
## Key results (v3a)
|
| 37 |
+
|
| 38 |
+
- **Instruction following**: prompt 9/5/3 layers β outputs 6.95/4.36/2.85 layers; shuffled-color prompt mIoU 0.456 β canonical 0.461 (the model reads the prompt's color map).
|
| 39 |
+
- **Cross-device zero-shot (OCTA500, native 5-layer prompt)**: binary retina IoU **0.538 β 0.897** vs the single-prompt pilot.
|
| 40 |
+
- **per-scheme mIoU (incl bg, N=150)**: 9-layer 0.461 / 5-layer 0.526 / 3-layer 0.610.
|
| 41 |
+
- vs OCT-RAE backbone: 10-class strict mIoU 0.023 β 0.507 (22Γ).
|
| 42 |
+
|
| 43 |
+
## Restore on a new server
|
| 44 |
+
|
| 45 |
+
```bash
|
| 46 |
+
# 1. download this repo
|
| 47 |
+
hf download <this-repo-id> --repo-type model --local-dir octflow_restore
|
| 48 |
+
|
| 49 |
+
# 2. unpack code
|
| 50 |
+
mkdir RAEv2 && tar xzf octflow_restore/octflow-raev2-code.tar.gz -C RAEv2
|
| 51 |
+
cd RAEv2
|
| 52 |
+
|
| 53 |
+
# 3. env (uv) + put weights back where run.sh expects them
|
| 54 |
+
uv sync # or: conda env + pip install diffusers transformers torch ...
|
| 55 |
+
mkdir -p pilot/path1/results/sd3_oct_stageA_v3/checkpoints
|
| 56 |
+
mkdir -p pilot/path1/results/sd3_vb_stageC_v3a/checkpoints
|
| 57 |
+
cp octflow_restore/weights/sd3_oct_stageA_v3_step20000.pt pilot/path1/results/sd3_oct_stageA_v3/checkpoints/step-0020000.pt
|
| 58 |
+
cp octflow_restore/weights/sd3_vb_stageC_v3a_step30000.pt pilot/path1/results/sd3_vb_stageC_v3a/checkpoints/step-0030000.pt
|
| 59 |
+
|
| 60 |
+
# 4. point configs/scripts at the new dataset root, then see pilot/path1/run.sh
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
SD3 medium base weights (`stabilityai/stable-diffusion-3-medium-diffusers`) are
|
| 64 |
+
downloaded from HF at runtime, not bundled here.
|
| 65 |
+
|
| 66 |
+
## Reproduce / next step
|
| 67 |
+
|
| 68 |
+
The full pipeline is `pilot/path1/run.sh`. Next planned step is **v3b**:
|
| 69 |
+
decoded-space loss (palette CE + soft Dice + thin-layer weighting) to fix the
|
| 70 |
+
generalist tax and weak thin layers (RPE/GCL). Clinical scope is the macula.
|
octflow-raev2-code.tar.gz
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f1f4703eb902ed2849f291e8f96d14433c70db8a311063f8808dadbff2c57305
|
| 3 |
+
size 1072660574
|
weights/sd3_oct_stageA_v3_step20000.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:794136b2ccfe297adf566969366ddaab9c6fe0f9e7eb50aecc1c018dd34dc440
|
| 3 |
+
size 8340371276
|
weights/sd3_vb_stageC_v1_step20000.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:26b4d8c76a59f7622e5641fe6200831cfe32f3bbe0cb2d17e2ccc9e925b2f97a
|
| 3 |
+
size 8340762482
|
weights/sd3_vb_stageC_v2_step20000.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fd2999818ef21280b808cb2f2404c86652a2d461eae8a09a56cc94ea82367874
|
| 3 |
+
size 8340762482
|
weights/sd3_vb_stageC_v3a_step30000.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e34057970598b7b2f89180c5f7461a146a8c2da2037122a8fd5037506234cf71
|
| 3 |
+
size 8340763852
|