PitVQA Motion Model

Specialized model for temporal motion detection between surgical video frames.

Description

This model analyzes pairs of frames to detect instrument motion. It outputs structured motion annotations:

<motion instrument="suction" direction="left" magnitude="moderate"/>

Training

Base: Qwen/Qwen2-VL-2B-Instruct
Method: SFT with LoRA
Dataset: Motion annotations from PitVis-2023

Usage

from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from peft import PeftModel

base = Qwen2VLForConditionalGeneration.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", ...)
model = PeftModel.from_pretrained(base, "mmrech/pitvqa-qwen2vl-motion")

# Provide two frames for motion analysis
messages = [{"role": "user", "content": [
    {"type": "image", "image": frame1},
    {"type": "image", "image": frame2},
    {"type": "text", "text": "Describe the instrument motion between these frames."}
]}]

Related Models

pitvqa-qwen2vl-unified-v2 - Multi-task including motion (Stage 3)

Downloads last month: 11

Model tree for mmrech/pitvqa-qwen2vl-motion

Base model

Qwen/Qwen2-VL-2B

Finetuned

Qwen/Qwen2-VL-2B-Instruct

Adapter

(137)

this model