PitVQA Motion Model
Specialized model for temporal motion detection between surgical video frames.
Description
This model analyzes pairs of frames to detect instrument motion. It outputs structured motion annotations:
<motion instrument="suction" direction="left" magnitude="moderate"/>
Training
- Base: Qwen/Qwen2-VL-2B-Instruct
- Method: SFT with LoRA
- Dataset: Motion annotations from PitVis-2023
Usage
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
base = Qwen2VLForConditionalGeneration.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", ...)
model = PeftModel.from_pretrained(base, "mmrech/pitvqa-qwen2vl-motion")
# Provide two frames for motion analysis
messages = [{"role": "user", "content": [
{"type": "image", "image": frame1},
{"type": "image", "image": frame2},
{"type": "text", "text": "Describe the instrument motion between these frames."}
]}]
Related Models
- pitvqa-qwen2vl-unified-v2 - Multi-task including motion (Stage 3)
- Downloads last month
- 60