Bopa-Boptech
/

VLM-Cholecystectomie

Video Classification

cholecystectomie

Model card Files Files and versions

Bopa-Boptech commited on Feb 17

Commit

610b3c3

·

verified ·

1 Parent(s): c980983

Create README.md

Files changed (1) hide show

README.md +60 -0

README.md ADDED Viewed

	@@ -0,0 +1,60 @@

+---
+license: mit
+language:
+- en
+base_model:
+- Bopa-Boptech/VLM-Cholecystectomie
+pipeline_tag: video-classification
+tags:
+- medical
+- surgery
+- cholecystectomie
+- wip
+---
+# VLM-Cholecystectomie
+This repository contains models for **Surgical Phase, Step, Target and Tools Recognition** in Laparoscopic Cholecystectomy videos.\
+It features two distinct approaches: a lightweight custom **ViT (ResNet+Transformer)** and a large-scale finetuned **Qwen3-VL**.
+## Models
+### 1. ViT-ResNet
+A lightweight, specialized architecture designed for efficient video classification.
+- **Backbone:** ResNet50 (Frozen)
+- **Aggregator:** Temporal Transformer Encoder
+- **MLP Heads:** for **Phase**, **Step**, **Target** and **Tool** prediction.
+### 2. Qwen3-VL
+A Vision-Language Model finetuned for surgical understanding.
+- **Base Model:** `unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit`
+- **Method:** LoRA Finetuning (Vision & Language layers)
+## Tasks
+The models are trained to predict three levels of surgical granularity simultaneously:
+1.  **Phase:** High-level surgical stages (e.g., `PREPARATION`, `CALOT_TRIANGLE_DISSECTION`).
+2.  **Step:** Fine-grained surgical actions (e.g., `CYSTIC_DUCT_DISSECTION`, `CLIPPING`).
+3.  **Target:** The anatomical structure or object being operated on (e.g., `CYSTIC_ARTERY`, `GALLBLADDER`).
+4.  **Tool(s):** The list of tool(s) being actively used in the surgery (e.g.,  `GRASPER HOOK`, `GRASPER`).
+## Usage
+### Inference with ViT-ResNet
+The ViT model requires the specific architecture definition (available in the `src` folder of the associated code repository or the Space).
+```python
+import torch
+from model_utils import SurgicalTransformer # Custom class
+# Load Model
+model = SurgicalTransformer(vocab_size_dict={"phase": 7, "step": 30, "target": 29})
+checkpoint = torch.load("models/vit_v1/vit_resampling_v1.pt")
+model.load_state_dict(checkpoint["model_state_dict"])
+model.eval()
+# Inference
+# input_tensor: [Batch, Time, Channels, Height, Width]
+output = model(input_tensor)
+```