| --- |
| license: cc-by-4.0 |
| --- |
| <div id="top" align="center"> |
|
|
| # SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion (IPCAI 2025) |
|
|
| [](https://arxiv.org/abs/2502.07945) |
| [](https://rdcu.be/em4E2) |
| [](https://huggingface.co/SsharvienKumar/SurGrID) |
|
|
| </div> |
|
|
|
|
| ## 💡Key Features |
| - We show that SGs can encode surgical scenes in a human-readable format. |
| - We propose a novel pre-training step that encodes global and local information from (image, mask, SG) triplets. The learned embeddings are employed to condition graph to image diffusion for high-quality and precisely controllable surgical simulation. |
| - We evaluate our generative approach on scenes from cataract surgeries using quantitative fidelity and diversity measurements, followed by an extensive user study |
| involving clinical experts |
|
|
|
|
| ## 🛠 Setup |
| ```bash |
| git clone https://github.com/MECLabTUDA/SurGrID.git |
| cd SurGrID |
| conda create -n surgrid python=3.8.5 pip=20.3.3 |
| conda activate surgrid |
| |
| pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118 |
| pip install -r requirements.txt |
| ``` |
|
|
| ## 🏁 Model Checkpoints and Dataset |
| Download the checkpoints of all the necessary models from the provided sources and place them in `[results](./results)`. We also provide the processed CADIS dataset, containing images, segmentation masks and their scene graphs. Update the paths of the dataset in `[configs](./configs)`. |
| - `Checkpoints`: [VQGANs, GraphEncoders, Diffusion Model](https://huggingface.co/SsharvienKumar/SurGrID/tree/main/checkpoints) |
| - `Processed Dataset`: [CADIS](https://huggingface.co/SsharvienKumar/SurGrID/tree/main/dataset) |
|
|
|
|
| ## 💥 Sampling SurGrID |
| ```bash |
| python script/sampler_diffusion.py --conf configs/eval/eval_combined_emb.yaml |
| ``` |
|
|
|
|
| ## ⏳ Training SurGrID |
| **Step 1:** Train Separate VQGAN for Image and Segmentation |
| ```bash |
| python surgrid/taming/main.py --base configs/vqgan/vqgan_image_cadis.yaml -t --gpus 0, |
| python surgrid/taming/main.py --base configs/vqgan/vqgan_segmentation_cadis.yaml -t --gpus 0, |
| ``` |
|
|
| **Step 2:** Train Both Graph Encoder |
| ```bash |
| python script/trainer_graph.py --mode masked --conf configs/graph/graph_cadis.yaml |
| python script/trainer_graph.py --mode segclip --conf configs/graph/graph_cadis.yaml |
| ``` |
|
|
| **Step 3:** Train Diffusion Model |
| ```bash |
| python script/trainer_diffusion.py --conf configs/trainer/combined_emb.yaml |
| ``` |
|
|
|
|
| ## 🔄 Training SurGrID on a New Dataset |
| The files below needs to be adapted: |
| - [Configs](./configs) |
| - [SurGrID Dataset](./surgrid/dataset/cadis_dataset.py) |
| - [VQGAN Dataset](./surgrid/taming/taming/data/cadis.py) |
| - [CADIS Specifications in Graph Encoder Pre-training](./surgrid/graph/graph_masked_segclip.py) |
|
|
|
|
| ## 🥼 Clinical Expert Assesment |
| ```bash |
| python script/demo_surgrid.py --conf configs/trainer/combined_emb.yaml |
| ``` |
| Our demo GUI allows for loading ground-truth graphs along with the ground-truth image. The graph’s nodes can be moved, deleted, or have their class changed. We instruct our participants to load four different ground-truth graphs and sequentially perform the following actions on each. They are requested to score the samples’ realism and coherence with the graph input using a Likert scale of 1 to 7: |
|
|
| - First, participants are instructed to generate a batch of four samples from the groundtruth SG without modifications. |
| - Second, the participants are requested to spatially move nodes in the canvas and again judge the synthesised samples. |
| - Third, participants change the class of one of the instrument nodes and judge the generated images. |
| - Lastly, participants are instructed to remove one of the instruments or miscellaneous classes and judge the synthesised image a final time. |
|
|
| <table> |
| <thead> |
| <tr> |
| <th rowspan="2">Clinician</th> |
| <th colspan="2">Synthesisation from GT</th> |
| <th colspan="2">Spatial Modification</th> |
| <th colspan="2">Tool Modification</th> |
| <th colspan="2">Tool Removal</th> |
| </tr> |
| <tr> |
| <th>Realism</th> |
| <th>Coherence</th> |
| <th>Realism</th> |
| <th>Coherence</th> |
| <th>Realism</th> |
| <th>Coherence</th> |
| <th>Realism</th> |
| <th>Coherence</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>P1</td> |
| <td>6.5±0.5</td> |
| <td>6.5±1.0</td> |
| <td>6.3±0.9</td> |
| <td>6.3±0.9</td> |
| <td>5.3±1.2</td> |
| <td>4.5±1.9</td> |
| <td>6.3±0.9</td> |
| <td>5.5±2.3</td> |
| </tr> |
| <tr> |
| <td>P2</td> |
| <td>5.3±0.9</td> |
| <td>5.3±0.5</td> |
| <td>4.5±0.5</td> |
| <td>4.3±2.0</td> |
| <td>5.3±0.9</td> |
| <td>5.8±0.9</td> |
| <td>5.5±1.2</td> |
| <td>5.5±1.9</td> |
| </tr> |
| <tr> |
| <td>P3</td> |
| <td>6.3±0.9</td> |
| <td>6.3±0.9</td> |
| <td>6.5±1.0</td> |
| <td>5.5±0.5</td> |
| <td>6.0±0.8</td> |
| <td>6.8±0.5</td> |
| <td>6.3±0.5</td> |
| <td>6.5±0.5</td> |
| </tr> |
| </tbody> |
| </table> |
| |
|
|
| ## 📜 Citations |
| If you are using SurGrID for your paper, please cite the following paper: |
| ``` |
| @article{frisch2025surgrid, |
| title={SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion}, |
| author={Frisch, Yannik and Sivakumar, Ssharvien Kumar and K{\"o}ksal, {\c{C}}a{\u{g}}han and B{\"o}hm, Elsa and Wagner, Felix and Gericke, Adrian and Ghazaei, Ghazal and Mukhopadhyay, Anirban}, |
| journal={arXiv preprint arXiv:2502.07945}, |
| year={2025} |
| } |
| ``` |
|
|
|
|
| ## ⭐ Acknowledgement |
| Thanks for the following projects and theoretical works that we have either used or inspired from: |
| - [VQGAN](https://github.com/CompVis/taming-transformers) |
| - [Lucidrains' DDPM](https://github.com/lucidrains/denoising-diffusion-pytorch) |
| - [SGDiff](https://github.com/YangLing0818/SGDiff) |
| - [Endora's README](https://github.com/CUHK-AIM-Group/Endora) |