File size: 5,980 Bytes
c721c80 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | ---
license: cc-by-4.0
---
<div id="top" align="center">
# SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion (IPCAI 2025)
[](https://arxiv.org/abs/2502.07945)
[](https://rdcu.be/em4E2)
[](https://huggingface.co/SsharvienKumar/SurGrID)
</div>
## 💡Key Features
- We show that SGs can encode surgical scenes in a human-readable format.
- We propose a novel pre-training step that encodes global and local information from (image, mask, SG) triplets. The learned embeddings are employed to condition graph to image diffusion for high-quality and precisely controllable surgical simulation.
- We evaluate our generative approach on scenes from cataract surgeries using quantitative fidelity and diversity measurements, followed by an extensive user study
involving clinical experts
## 🛠 Setup
```bash
git clone https://github.com/MECLabTUDA/SurGrID.git
cd SurGrID
conda create -n surgrid python=3.8.5 pip=20.3.3
conda activate surgrid
pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
```
## 🏁 Model Checkpoints and Dataset
Download the checkpoints of all the necessary models from the provided sources and place them in `[results](./results)`. We also provide the processed CADIS dataset, containing images, segmentation masks and their scene graphs. Update the paths of the dataset in `[configs](./configs)`.
- `Checkpoints`: [VQGANs, GraphEncoders, Diffusion Model](https://huggingface.co/SsharvienKumar/SurGrID/tree/main/checkpoints)
- `Processed Dataset`: [CADIS](https://huggingface.co/SsharvienKumar/SurGrID/tree/main/dataset)
## 💥 Sampling SurGrID
```bash
python script/sampler_diffusion.py --conf configs/eval/eval_combined_emb.yaml
```
## ⏳ Training SurGrID
**Step 1:** Train Separate VQGAN for Image and Segmentation
```bash
python surgrid/taming/main.py --base configs/vqgan/vqgan_image_cadis.yaml -t --gpus 0,
python surgrid/taming/main.py --base configs/vqgan/vqgan_segmentation_cadis.yaml -t --gpus 0,
```
**Step 2:** Train Both Graph Encoder
```bash
python script/trainer_graph.py --mode masked --conf configs/graph/graph_cadis.yaml
python script/trainer_graph.py --mode segclip --conf configs/graph/graph_cadis.yaml
```
**Step 3:** Train Diffusion Model
```bash
python script/trainer_diffusion.py --conf configs/trainer/combined_emb.yaml
```
## 🔄 Training SurGrID on a New Dataset
The files below needs to be adapted:
- [Configs](./configs)
- [SurGrID Dataset](./surgrid/dataset/cadis_dataset.py)
- [VQGAN Dataset](./surgrid/taming/taming/data/cadis.py)
- [CADIS Specifications in Graph Encoder Pre-training](./surgrid/graph/graph_masked_segclip.py)
## 🥼 Clinical Expert Assesment
```bash
python script/demo_surgrid.py --conf configs/trainer/combined_emb.yaml
```
Our demo GUI allows for loading ground-truth graphs along with the ground-truth image. The graph’s nodes can be moved, deleted, or have their class changed. We instruct our participants to load four different ground-truth graphs and sequentially perform the following actions on each. They are requested to score the samples’ realism and coherence with the graph input using a Likert scale of 1 to 7:
- First, participants are instructed to generate a batch of four samples from the groundtruth SG without modifications.
- Second, the participants are requested to spatially move nodes in the canvas and again judge the synthesised samples.
- Third, participants change the class of one of the instrument nodes and judge the generated images.
- Lastly, participants are instructed to remove one of the instruments or miscellaneous classes and judge the synthesised image a final time.
<table>
<thead>
<tr>
<th rowspan="2">Clinician</th>
<th colspan="2">Synthesisation from GT</th>
<th colspan="2">Spatial Modification</th>
<th colspan="2">Tool Modification</th>
<th colspan="2">Tool Removal</th>
</tr>
<tr>
<th>Realism</th>
<th>Coherence</th>
<th>Realism</th>
<th>Coherence</th>
<th>Realism</th>
<th>Coherence</th>
<th>Realism</th>
<th>Coherence</th>
</tr>
</thead>
<tbody>
<tr>
<td>P1</td>
<td>6.5±0.5</td>
<td>6.5±1.0</td>
<td>6.3±0.9</td>
<td>6.3±0.9</td>
<td>5.3±1.2</td>
<td>4.5±1.9</td>
<td>6.3±0.9</td>
<td>5.5±2.3</td>
</tr>
<tr>
<td>P2</td>
<td>5.3±0.9</td>
<td>5.3±0.5</td>
<td>4.5±0.5</td>
<td>4.3±2.0</td>
<td>5.3±0.9</td>
<td>5.8±0.9</td>
<td>5.5±1.2</td>
<td>5.5±1.9</td>
</tr>
<tr>
<td>P3</td>
<td>6.3±0.9</td>
<td>6.3±0.9</td>
<td>6.5±1.0</td>
<td>5.5±0.5</td>
<td>6.0±0.8</td>
<td>6.8±0.5</td>
<td>6.3±0.5</td>
<td>6.5±0.5</td>
</tr>
</tbody>
</table>
## 📜 Citations
If you are using SurGrID for your paper, please cite the following paper:
```
@article{frisch2025surgrid,
title={SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion},
author={Frisch, Yannik and Sivakumar, Ssharvien Kumar and K{\"o}ksal, {\c{C}}a{\u{g}}han and B{\"o}hm, Elsa and Wagner, Felix and Gericke, Adrian and Ghazaei, Ghazal and Mukhopadhyay, Anirban},
journal={arXiv preprint arXiv:2502.07945},
year={2025}
}
```
## ⭐ Acknowledgement
Thanks for the following projects and theoretical works that we have either used or inspired from:
- [VQGAN](https://github.com/CompVis/taming-transformers)
- [Lucidrains' DDPM](https://github.com/lucidrains/denoising-diffusion-pytorch)
- [SGDiff](https://github.com/YangLing0818/SGDiff)
- [Endora's README](https://github.com/CUHK-AIM-Group/Endora) |