Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,124 @@
|
|
| 1 |
-
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SASVi - Segment Any Surgical Video
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
## Overview
|
| 5 |
+
|
| 6 |
+
SASVi leverages pre-trained frame-wise object detection and segmentation to re-prompt SAM2
|
| 7 |
+
for improved surgical video segmentation with scarcely annotated data.
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
## Example Results
|
| 11 |
+
|
| 12 |
+
* You can find the complete segmentations of the video datasets [here](https://huggingface.co/SsharvienKumar/SASVi/tree/main/dataset).
|
| 13 |
+
* Checkpoints of the all the overseers can be found [here](https://huggingface.co/SsharvienKumar/SASVi/tree/main/checkpoints).
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
## Setup
|
| 17 |
+
|
| 18 |
+
* Create a virtual environment of your choice and activate it: `conda create -n sasvi python=3.11 && conda activate sasvi`
|
| 19 |
+
* Install `torch>=2.3.1` and `torchvision>=0.18.1` following the instructions from [here](https://pytorch.org/get-started/locally/)
|
| 20 |
+
* Install the dependencies using `pip install -r requirements.txt`
|
| 21 |
+
* Install SDS_Playground from [here](https://github.com/MECLabTUDA/SDS_Playground)
|
| 22 |
+
* Install SAM2 using `cd src/sam2 && pip install -e .`
|
| 23 |
+
* Place SAM2 [checkpoints](https://github.com/facebookresearch/sam2/tree/main#model-description) at `src/sam2/checkpoints`
|
| 24 |
+
* Convert video files to frame folders using `bash helper_scripts/video_to_frames.sh`. The output should be in the format:
|
| 25 |
+
```
|
| 26 |
+
<video_root>
|
| 27 |
+
βββ <video1>
|
| 28 |
+
β βββ 0001.jpg
|
| 29 |
+
β βββ 0002.jpg
|
| 30 |
+
β βββ ...
|
| 31 |
+
βββ <video2>
|
| 32 |
+
β βββ 0001.jpg
|
| 33 |
+
β βββ 0002.jpg
|
| 34 |
+
β βββ ...
|
| 35 |
+
βββ ...
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
## Overseer Model Training
|
| 40 |
+
|
| 41 |
+
We provide training scripts for three different overseer models (Mask R-CNN, DETR, Mask2Former)
|
| 42 |
+
on three different datasets (CaDIS, CholecSeg8k, Cataract1k).
|
| 43 |
+
|
| 44 |
+
You can run the training scripts as follows:
|
| 45 |
+
|
| 46 |
+
`python train_scripts/train_<OVERSEER>_<DATASET>.py`
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
## SASVi Inference
|
| 50 |
+
|
| 51 |
+
The frames in the video needs to be extracted beforehand and placed in the formatting above. More optional arguments can be found in the script directly.
|
| 52 |
+
|
| 53 |
+
```
|
| 54 |
+
python src/sam2/eval_sasvi.py \
|
| 55 |
+
--sam2_cfg configs/sam2.1_hiera_l.yaml \
|
| 56 |
+
--sam2_checkpoint ./checkpoints/<SAM2_CHECKPOINT>.pt \
|
| 57 |
+
--overseer_checkpoint <PATH_TO_OVERSEER_CHECKPOINT>.pth \
|
| 58 |
+
--overseer_type <NAME_OF_OVERSEER> \
|
| 59 |
+
--dataset_type <NAME_OF_DATASET> \
|
| 60 |
+
--base_video_dir <PATH_TO_VIDEO_ROOT> \
|
| 61 |
+
--output_mask_dir <OUTPUT_PATH_TO_SASVi_MASK> \
|
| 62 |
+
--overseer_mask_dir <OPTIONAL - OUTPUT_PATH_TO_OVERSEER_MASK>
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
## nnUNet Training & Inference
|
| 67 |
+
|
| 68 |
+
Fold 0: `nnUNetv2_train DATASET_ID 2d 0 -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs --npz`
|
| 69 |
+
|
| 70 |
+
Fold 1: `nnUNetv2_train DATASET_ID 2d 1 -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs --npz`
|
| 71 |
+
|
| 72 |
+
Fold 2: `nnUNetv2_train DATASET_ID 2d 2 -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs --npz`
|
| 73 |
+
|
| 74 |
+
Fold 3: `nnUNetv2_train DATASET_ID 2d 3 -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs --npz`
|
| 75 |
+
|
| 76 |
+
Fold 4: `nnUNetv2_train DATASET_ID 2d 4 -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs --npz`
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
Then find the best configuration using
|
| 80 |
+
|
| 81 |
+
`nnUNetv2_find_best_configuration DATASET_ID -c 2d -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs`
|
| 82 |
+
|
| 83 |
+
And run inference using
|
| 84 |
+
|
| 85 |
+
`nnUNetv2_predict -d DATASET_ID -i INPUT_FOLDER -o OUTPUT_FOLDER -f 0 1 2 3 4 -tr nnUNetTrainer_400epochs -c 2d -p nnUNetResEncUNetMPlans`
|
| 86 |
+
|
| 87 |
+
Once inference is completed, run postprocessing
|
| 88 |
+
|
| 89 |
+
`nnUNetv2_apply_postprocessing -i OUTPUT_FOLDER -o OUTPUT_FOLDER_PP -pp_pkl_file .../postprocessing.pkl -np 8 -plans_json .../plans.json`
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
## Evaluation
|
| 93 |
+
|
| 94 |
+
* For frame-wise segmentation evaluation:
|
| 95 |
+
* `python eval_scripts/eval_<OVERSEER>_frames.py`
|
| 96 |
+
* For frame-wise segmentation prediction on full videos:
|
| 97 |
+
* See `python eval_scripts/eval_MaskRCNN_videos.py` for an example.
|
| 98 |
+
* For video evaluation:
|
| 99 |
+
1. E.g. `python eval_scripts/eval_vid_T.py --segm_root <path_to_segmentation_root> --vid_pattern 'train' --mask_pattern '*.npz' --ignore 255 --device cuda`
|
| 100 |
+
2. E.g. `python eval_scripts/eval_vid_F.py --segm_root <path_to_segmentation_root> --frames_root <path_to_frames_root> --vid_pattern 'train' --frames_pattern '*.jpg' --mask_pattern '*.npz' --raft_iters 12 --device cuda`
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
## TODOs
|
| 104 |
+
|
| 105 |
+
* [ ] **The code will be refactored soon to be more modular and reusable!**
|
| 106 |
+
* [ ] Pre-process Cholec80 videos with out-of-body detection
|
| 107 |
+
* [ ] Improve SASVi by combining it with GT prompting (if available)
|
| 108 |
+
* [ ] Test SAM2 finetuning
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
## Citation
|
| 112 |
+
|
| 113 |
+
If you use SASVi in your research, please cite our paper:
|
| 114 |
+
|
| 115 |
+
```
|
| 116 |
+
@article{sivakumar2025sasvi,
|
| 117 |
+
title={SASVi: segment any surgical video},
|
| 118 |
+
author={Sivakumar, Ssharvien Kumar and Frisch, Yannik and Ranem, Amin and Mukhopadhyay, Anirban},
|
| 119 |
+
journal={International Journal of Computer Assisted Radiology and Surgery},
|
| 120 |
+
pages={1--11},
|
| 121 |
+
year={2025},
|
| 122 |
+
publisher={Springer}
|
| 123 |
+
}
|
| 124 |
+
```
|