File size: 5,980 Bytes
c721c80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
license: cc-by-4.0
---
<div id="top" align="center">

# SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion (IPCAI 2025)

  [![arXiv](https://img.shields.io/badge/arXiv-2502.07945-b31b1b.svg)](https://arxiv.org/abs/2502.07945)
  [![Paper](https://img.shields.io/badge/Paper-Visit-blue)](https://rdcu.be/em4E2)
  [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/SsharvienKumar/SurGrID)

</div>


## 💡Key Features
- We show that SGs can encode surgical scenes in a human-readable format. 
- We propose a novel pre-training step that encodes global and local information from (image, mask, SG) triplets. The learned embeddings are employed to condition graph to image diffusion for high-quality and precisely controllable surgical simulation.
- We evaluate our generative approach on scenes from cataract surgeries using quantitative fidelity and diversity measurements, followed by an extensive user study
involving clinical experts


## 🛠 Setup
```bash
git clone https://github.com/MECLabTUDA/SurGrID.git
cd SurGrID
conda create -n surgrid python=3.8.5 pip=20.3.3
conda activate surgrid

pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
```

## 🏁 Model Checkpoints and Dataset
Download the checkpoints of all the necessary models from the provided sources and place them in `[results](./results)`. We also provide the processed CADIS dataset, containing images, segmentation masks and their scene graphs. Update the paths of the dataset in `[configs](./configs)`.
- `Checkpoints`: [VQGANs, GraphEncoders, Diffusion Model](https://huggingface.co/SsharvienKumar/SurGrID/tree/main/checkpoints)
- `Processed Dataset`: [CADIS](https://huggingface.co/SsharvienKumar/SurGrID/tree/main/dataset)


## 💥 Sampling SurGrID
```bash
python script/sampler_diffusion.py --conf configs/eval/eval_combined_emb.yaml
```


## ⏳ Training SurGrID
**Step 1:** Train Separate VQGAN for Image and Segmentation
```bash
python surgrid/taming/main.py --base configs/vqgan/vqgan_image_cadis.yaml -t --gpus 0,
python surgrid/taming/main.py --base configs/vqgan/vqgan_segmentation_cadis.yaml -t --gpus 0,
```

**Step 2:** Train Both Graph Encoder
```bash
python script/trainer_graph.py --mode masked --conf configs/graph/graph_cadis.yaml
python script/trainer_graph.py --mode segclip --conf configs/graph/graph_cadis.yaml
```

**Step 3:** Train Diffusion Model
```bash
python script/trainer_diffusion.py --conf configs/trainer/combined_emb.yaml
```


## 🔄 Training SurGrID on a New Dataset
The files below needs to be adapted:
- [Configs](./configs)
- [SurGrID Dataset](./surgrid/dataset/cadis_dataset.py)
- [VQGAN Dataset](./surgrid/taming/taming/data/cadis.py)
- [CADIS Specifications in Graph Encoder Pre-training](./surgrid/graph/graph_masked_segclip.py)


## 🥼 Clinical Expert Assesment
```bash
python script/demo_surgrid.py --conf configs/trainer/combined_emb.yaml
```
Our demo GUI allows for loading ground-truth graphs along with the ground-truth image. The graph’s nodes can be moved, deleted, or have their class changed. We instruct our participants to load four different ground-truth graphs and sequentially perform the following actions on each. They are requested to score the samples’ realism and coherence with the graph input using a Likert scale of 1 to 7:

- First, participants are instructed to generate a batch of four samples from the groundtruth SG without modifications.
- Second, the participants are requested to spatially move nodes in the canvas and again judge the synthesised samples. 
- Third, participants change the class of one of the instrument nodes and judge the generated images. 
- Lastly, participants are instructed to remove one of the instruments or miscellaneous classes and judge the synthesised image a final time.

<table>
  <thead>
    <tr>
      <th rowspan="2">Clinician</th>
      <th colspan="2">Synthesisation from GT</th>
      <th colspan="2">Spatial Modification</th>
      <th colspan="2">Tool Modification</th>
      <th colspan="2">Tool Removal</th>
    </tr>
    <tr>
      <th>Realism</th>
      <th>Coherence</th>
      <th>Realism</th>
      <th>Coherence</th>
      <th>Realism</th>
      <th>Coherence</th>
      <th>Realism</th>
      <th>Coherence</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>P1</td>
      <td>6.5±0.5</td>
      <td>6.5±1.0</td>
      <td>6.3±0.9</td>
      <td>6.3±0.9</td>
      <td>5.3±1.2</td>
      <td>4.5±1.9</td>
      <td>6.3±0.9</td>
      <td>5.5±2.3</td>
    </tr>
    <tr>
      <td>P2</td>
      <td>5.3±0.9</td>
      <td>5.3±0.5</td>
      <td>4.5±0.5</td>
      <td>4.3±2.0</td>
      <td>5.3±0.9</td>
      <td>5.8±0.9</td>
      <td>5.5±1.2</td>
      <td>5.5±1.9</td>
    </tr>
    <tr>
      <td>P3</td>
      <td>6.3±0.9</td>
      <td>6.3±0.9</td>
      <td>6.5±1.0</td>
      <td>5.5±0.5</td>
      <td>6.0±0.8</td>
      <td>6.8±0.5</td>
      <td>6.3±0.5</td>
      <td>6.5±0.5</td>
    </tr>
  </tbody>
</table>


## 📜 Citations
If you are using SurGrID for your paper, please cite the following paper:
```
@article{frisch2025surgrid,
  title={SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion},
  author={Frisch, Yannik and Sivakumar, Ssharvien Kumar and K{\"o}ksal, {\c{C}}a{\u{g}}han and B{\"o}hm, Elsa and Wagner, Felix and Gericke, Adrian and Ghazaei, Ghazal and Mukhopadhyay, Anirban},
  journal={arXiv preprint arXiv:2502.07945},
  year={2025}
}
```


## ⭐ Acknowledgement
Thanks for the following projects and theoretical works that we have either used or inspired from:
- [VQGAN](https://github.com/CompVis/taming-transformers)
- [Lucidrains' DDPM](https://github.com/lucidrains/denoising-diffusion-pytorch)
- [SGDiff](https://github.com/YangLing0818/SGDiff)
- [Endora's README](https://github.com/CUHK-AIM-Group/Endora)