SsharvienKumar commited on
Commit
7c7596e
Β·
verified Β·
1 Parent(s): c626c9e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +124 -3
README.md CHANGED
@@ -1,3 +1,124 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SASVi - Segment Any Surgical Video
2
+
3
+
4
+ ## Overview
5
+
6
+ SASVi leverages pre-trained frame-wise object detection and segmentation to re-prompt SAM2
7
+ for improved surgical video segmentation with scarcely annotated data.
8
+
9
+
10
+ ## Example Results
11
+
12
+ * You can find the complete segmentations of the video datasets [here](https://huggingface.co/SsharvienKumar/SASVi/tree/main/dataset).
13
+ * Checkpoints of the all the overseers can be found [here](https://huggingface.co/SsharvienKumar/SASVi/tree/main/checkpoints).
14
+
15
+
16
+ ## Setup
17
+
18
+ * Create a virtual environment of your choice and activate it: `conda create -n sasvi python=3.11 && conda activate sasvi`
19
+ * Install `torch>=2.3.1` and `torchvision>=0.18.1` following the instructions from [here](https://pytorch.org/get-started/locally/)
20
+ * Install the dependencies using `pip install -r requirements.txt`
21
+ * Install SDS_Playground from [here](https://github.com/MECLabTUDA/SDS_Playground)
22
+ * Install SAM2 using `cd src/sam2 && pip install -e .`
23
+ * Place SAM2 [checkpoints](https://github.com/facebookresearch/sam2/tree/main#model-description) at `src/sam2/checkpoints`
24
+ * Convert video files to frame folders using `bash helper_scripts/video_to_frames.sh`. The output should be in the format:
25
+ ```
26
+ <video_root>
27
+ β”œβ”€β”€ <video1>
28
+ β”‚ β”œβ”€β”€ 0001.jpg
29
+ β”‚ β”œβ”€β”€ 0002.jpg
30
+ β”‚ └── ...
31
+ β”œβ”€β”€ <video2>
32
+ β”‚ β”œβ”€β”€ 0001.jpg
33
+ β”‚ β”œβ”€β”€ 0002.jpg
34
+ β”‚ └── ...
35
+ └── ...
36
+ ```
37
+
38
+
39
+ ## Overseer Model Training
40
+
41
+ We provide training scripts for three different overseer models (Mask R-CNN, DETR, Mask2Former)
42
+ on three different datasets (CaDIS, CholecSeg8k, Cataract1k).
43
+
44
+ You can run the training scripts as follows:
45
+
46
+ `python train_scripts/train_<OVERSEER>_<DATASET>.py`
47
+
48
+
49
+ ## SASVi Inference
50
+
51
+ The frames in the video needs to be extracted beforehand and placed in the formatting above. More optional arguments can be found in the script directly.
52
+
53
+ ```
54
+ python src/sam2/eval_sasvi.py \
55
+ --sam2_cfg configs/sam2.1_hiera_l.yaml \
56
+ --sam2_checkpoint ./checkpoints/<SAM2_CHECKPOINT>.pt \
57
+ --overseer_checkpoint <PATH_TO_OVERSEER_CHECKPOINT>.pth \
58
+ --overseer_type <NAME_OF_OVERSEER> \
59
+ --dataset_type <NAME_OF_DATASET> \
60
+ --base_video_dir <PATH_TO_VIDEO_ROOT> \
61
+ --output_mask_dir <OUTPUT_PATH_TO_SASVi_MASK> \
62
+ --overseer_mask_dir <OPTIONAL - OUTPUT_PATH_TO_OVERSEER_MASK>
63
+ ```
64
+
65
+
66
+ ## nnUNet Training & Inference
67
+
68
+ Fold 0: `nnUNetv2_train DATASET_ID 2d 0 -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs --npz`
69
+
70
+ Fold 1: `nnUNetv2_train DATASET_ID 2d 1 -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs --npz`
71
+
72
+ Fold 2: `nnUNetv2_train DATASET_ID 2d 2 -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs --npz`
73
+
74
+ Fold 3: `nnUNetv2_train DATASET_ID 2d 3 -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs --npz`
75
+
76
+ Fold 4: `nnUNetv2_train DATASET_ID 2d 4 -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs --npz`
77
+
78
+
79
+ Then find the best configuration using
80
+
81
+ `nnUNetv2_find_best_configuration DATASET_ID -c 2d -p nnUNetResEncUNetMPlans -tr nnUNetTrainer_400epochs`
82
+
83
+ And run inference using
84
+
85
+ `nnUNetv2_predict -d DATASET_ID -i INPUT_FOLDER -o OUTPUT_FOLDER -f 0 1 2 3 4 -tr nnUNetTrainer_400epochs -c 2d -p nnUNetResEncUNetMPlans`
86
+
87
+ Once inference is completed, run postprocessing
88
+
89
+ `nnUNetv2_apply_postprocessing -i OUTPUT_FOLDER -o OUTPUT_FOLDER_PP -pp_pkl_file .../postprocessing.pkl -np 8 -plans_json .../plans.json`
90
+
91
+
92
+ ## Evaluation
93
+
94
+ * For frame-wise segmentation evaluation:
95
+ * `python eval_scripts/eval_<OVERSEER>_frames.py`
96
+ * For frame-wise segmentation prediction on full videos:
97
+ * See `python eval_scripts/eval_MaskRCNN_videos.py` for an example.
98
+ * For video evaluation:
99
+ 1. E.g. `python eval_scripts/eval_vid_T.py --segm_root <path_to_segmentation_root> --vid_pattern 'train' --mask_pattern '*.npz' --ignore 255 --device cuda`
100
+ 2. E.g. `python eval_scripts/eval_vid_F.py --segm_root <path_to_segmentation_root> --frames_root <path_to_frames_root> --vid_pattern 'train' --frames_pattern '*.jpg' --mask_pattern '*.npz' --raft_iters 12 --device cuda`
101
+
102
+
103
+ ## TODOs
104
+
105
+ * [ ] **The code will be refactored soon to be more modular and reusable!**
106
+ * [ ] Pre-process Cholec80 videos with out-of-body detection
107
+ * [ ] Improve SASVi by combining it with GT prompting (if available)
108
+ * [ ] Test SAM2 finetuning
109
+
110
+
111
+ ## Citation
112
+
113
+ If you use SASVi in your research, please cite our paper:
114
+
115
+ ```
116
+ @article{sivakumar2025sasvi,
117
+ title={SASVi: segment any surgical video},
118
+ author={Sivakumar, Ssharvien Kumar and Frisch, Yannik and Ranem, Amin and Mukhopadhyay, Anirban},
119
+ journal={International Journal of Computer Assisted Radiology and Surgery},
120
+ pages={1--11},
121
+ year={2025},
122
+ publisher={Springer}
123
+ }
124
+ ```