Bopa-Boptech commited on
Commit
610b3c3
·
verified ·
1 Parent(s): c980983

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Bopa-Boptech/VLM-Cholecystectomie
7
+ pipeline_tag: video-classification
8
+ tags:
9
+ - medical
10
+ - surgery
11
+ - cholecystectomie
12
+ - wip
13
+ ---
14
+
15
+ # VLM-Cholecystectomie
16
+
17
+ This repository contains models for **Surgical Phase, Step, Target and Tools Recognition** in Laparoscopic Cholecystectomy videos.\
18
+ It features two distinct approaches: a lightweight custom **ViT (ResNet+Transformer)** and a large-scale finetuned **Qwen3-VL**.
19
+
20
+ ## Models
21
+
22
+ ### 1. ViT-ResNet
23
+ A lightweight, specialized architecture designed for efficient video classification.
24
+ - **Backbone:** ResNet50 (Frozen)
25
+ - **Aggregator:** Temporal Transformer Encoder
26
+ - **MLP Heads:** for **Phase**, **Step**, **Target** and **Tool** prediction.
27
+
28
+ ### 2. Qwen3-VL
29
+ A Vision-Language Model finetuned for surgical understanding.
30
+ - **Base Model:** `unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit`
31
+ - **Method:** LoRA Finetuning (Vision & Language layers)
32
+
33
+ ## Tasks
34
+ The models are trained to predict three levels of surgical granularity simultaneously:
35
+ 1. **Phase:** High-level surgical stages (e.g., `PREPARATION`, `CALOT_TRIANGLE_DISSECTION`).
36
+ 2. **Step:** Fine-grained surgical actions (e.g., `CYSTIC_DUCT_DISSECTION`, `CLIPPING`).
37
+ 3. **Target:** The anatomical structure or object being operated on (e.g., `CYSTIC_ARTERY`, `GALLBLADDER`).
38
+ 4. **Tool(s):** The list of tool(s) being actively used in the surgery (e.g., `GRASPER HOOK`, `GRASPER`).
39
+
40
+ ## Usage
41
+
42
+
43
+ ### Inference with ViT-ResNet
44
+ The ViT model requires the specific architecture definition (available in the `src` folder of the associated code repository or the Space).
45
+
46
+ ```python
47
+ import torch
48
+ from model_utils import SurgicalTransformer # Custom class
49
+
50
+ # Load Model
51
+ model = SurgicalTransformer(vocab_size_dict={"phase": 7, "step": 30, "target": 29})
52
+ checkpoint = torch.load("models/vit_v1/vit_resampling_v1.pt")
53
+ model.load_state_dict(checkpoint["model_state_dict"])
54
+ model.eval()
55
+
56
+ # Inference
57
+ # input_tensor: [Batch, Time, Channels, Height, Width]
58
+ output = model(input_tensor)
59
+ ```
60
+