MM-OR / README.md

Improve model card with metadata, links, and description (#1)

8867173 verified 6 months ago

2.82 kB

	---
	license: apache-2.0
	pipeline_tag: image-text-to-text
	library_name: transformers
	---

	# MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments

	<img align="right" src="figure.jpg" alt="teaser" width="100%" style="margin-left: 10px">

	This repository contains the MM2SG model, a multimodal large vision-language model for scene graph generation, as presented in the paper "MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments" (accepted at CVPR 2025). The model leverages multimodal inputs (including RGB-D data, detail views, audio, speech transcripts, robotic logs, and tracking data) to generate semantic scene graphs, enabling a more comprehensive understanding of complex operating room scenarios.

	Paper: https://arxiv.org/abs/2503.02579

	Code: https://github.com/egeozsoy/MM-OR


	Authors: [Ege Özsoy][eo], Chantal Pellegrini, Tobias Czempiel, Felix Tristram, Kun Yuan, David Bani-Harouni, Ulrich Eck, Benjamin Busam, Matthias Keicher, [Nassir Navab][nassir]

	[eo]: https://www.cs.cit.tum.de/camp/members/ege-oezsoy/
	[nassir]: https://www.cs.cit.tum.de/camp/members/cv-nassir-navab/nassir-navab/


	## MM-OR Dataset
	- To download MM-OR, first fill out this form https://forms.gle/kj47QXEcraQdGidg6 to get access to the download script. By filling out this form, you agree to the terms of use of the
	dataset.
	- You can use the download script, which automatically download the entire dataset consisting of multiple .zip files, and unzippes them. Make sure you have "wget" and "unzip" installed.
	- Put the newly created MM-OR_data folder into the root directory of this project.
	- Optionally download the 4D-OR dataset, download and put it to the root directory, and rename it 4D-OR_data. Instructions are in the official repo: https://github.com/egeozsoy/4D-OR. You can also find the newly annotated segmentations annotations and how to configure them in that repository.

	## Panoptic Segmentation and Scene Graph Generation Instructions
	Detailed instructions for Panoptic Segmentation and Scene Graph Generation training and evaluation are available within the respective subdirectories of this repository. Please refer to the README files within `panoptic_segmentation` and `scene_graph_generation` for specific instructions and requirements.


	```bibtex
	@inproceedings{ozsoy2024mmor,
	title={MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High Intensity Surgical Environments},
	author={\textbf{Ege Özsoy} and Pellegrini, Chantal and Czempiel, Tobias and Tristram, Felix and Yuan, Kun and Bani-Harouni, David and Eck, Ulrich and Busam, Benjamin and Keicher, Matthias and Navab, Nassir},
	booktitle={CVPR},
	note={Accepted},
	year={2025}
	}
	```