|
|
--- |
|
|
license: apache-2.0 |
|
|
pipeline_tag: image-text-to-text |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments |
|
|
|
|
|
<img align="right" src="figure.jpg" alt="teaser" width="100%" style="margin-left: 10px"> |
|
|
|
|
|
This repository contains the MM2SG model, a multimodal large vision-language model for scene graph generation, as presented in the paper "MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments" (accepted at CVPR 2025). The model leverages multimodal inputs (including RGB-D data, detail views, audio, speech transcripts, robotic logs, and tracking data) to generate semantic scene graphs, enabling a more comprehensive understanding of complex operating room scenarios. |
|
|
|
|
|
Paper: https://arxiv.org/abs/2503.02579 |
|
|
|
|
|
Code: https://github.com/egeozsoy/MM-OR |
|
|
|
|
|
|
|
|
**Authors**: [Ege Özsoy][eo], Chantal Pellegrini, Tobias Czempiel, Felix Tristram, Kun Yuan, David Bani-Harouni, Ulrich Eck, Benjamin Busam, Matthias Keicher, [Nassir Navab][nassir] |
|
|
|
|
|
[eo]: https://www.cs.cit.tum.de/camp/members/ege-oezsoy/ |
|
|
[nassir]: https://www.cs.cit.tum.de/camp/members/cv-nassir-navab/nassir-navab/ |
|
|
|
|
|
|
|
|
## MM-OR Dataset |
|
|
- To download MM-OR, first fill out this form https://forms.gle/kj47QXEcraQdGidg6 to get access to the download script. By filling out this form, you agree to the terms of use of the |
|
|
dataset. |
|
|
- You can use the download script, which automatically download the entire dataset consisting of multiple .zip files, and unzippes them. Make sure you have "wget" and "unzip" installed. |
|
|
- Put the newly created MM-OR_data folder into the root directory of this project. |
|
|
- Optionally download the 4D-OR dataset, download and put it to the root directory, and rename it 4D-OR_data. Instructions are in the official repo: https://github.com/egeozsoy/4D-OR. You can also find the newly annotated segmentations annotations and how to configure them in that repository. |
|
|
|
|
|
## Panoptic Segmentation and Scene Graph Generation Instructions |
|
|
Detailed instructions for Panoptic Segmentation and Scene Graph Generation training and evaluation are available within the respective subdirectories of this repository. Please refer to the README files within `panoptic_segmentation` and `scene_graph_generation` for specific instructions and requirements. |
|
|
|
|
|
|
|
|
```bibtex |
|
|
@inproceedings{ozsoy2024mmor, |
|
|
title={MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High Intensity Surgical Environments}, |
|
|
author={\textbf{Ege Özsoy} and Pellegrini, Chantal and Czempiel, Tobias and Tristram, Felix and Yuan, Kun and Bani-Harouni, David and Eck, Ulrich and Busam, Benjamin and Keicher, Matthias and Navab, Nassir}, |
|
|
booktitle={CVPR}, |
|
|
note={Accepted}, |
|
|
year={2025} |
|
|
} |
|
|
``` |