1.95 kB

Title: Towards Holistic Surgical Scene Understanding

URL Source: https://arxiv.org/html/2212.04582

Published Time: Mon, 29 Jan 2024 02:01:31 GMT

Markdown Content:

0.0.1

License: arXiv.org perpetual non-exclusive license

arXiv:2212.04582v4 [cs.CV] 26 Jan 2024

0.0.1

Supplemental Material Towards Holistic Surgical Scene Understanding.

Figure 1: PSI-AVA classes per task. (Left) the phases and steps are organized following the order found in a prostatectomy procedure. (Right) list of the class labels for the phase, step and atomic action recognition tasks and the instrument detection task. Best viewed in color.

Figure 2: PSI-AVA Statistics. Number of annotations for each class of the recognition and detection tasks. Colors denote the distribution in the fold partition.

Figure 3: Video Feature Extractor architecture. TAPIR builds upon MViT [9], which uses a multiscale pyramidal strategy to extract low-spatial but high-dimensional features from video sequences.

Figure 4: Performance comparison between TAPIR and SlowFast [10] grouped by A) long-term and B) short-term tasks. A) For the Phase and Step Recognition tasks, TAPIR shows higher continuity along with its predictions, while SlowFast fails to keep coherence. Supplemental Figure 1 shows color codes for both tasks. B) Both methods fail to recognize some of the atomic actions, demonstrating the task’s difficulty. However, TAPIR action prediction keeps coherence between the options, contrary to SlowFast’s (e.g., travel and still). Best viewed in color.

Xet Storage Details

Size:: 1.95 kB
Xet hash:: 819748892b9d033b39cbef34312aaf0e25cd779d5f4c235bc49a9afbb90b8abf

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.