PitVQA Motion Model

Specialized model for temporal motion detection between surgical video frames.

Description

This model analyzes pairs of frames to detect instrument motion. It outputs structured motion annotations:

<motion instrument="suction" direction="left" magnitude="moderate"/>

Training

  • Base: Qwen/Qwen2-VL-2B-Instruct
  • Method: SFT with LoRA
  • Dataset: Motion annotations from PitVis-2023

Usage

from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from peft import PeftModel

base = Qwen2VLForConditionalGeneration.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", ...)
model = PeftModel.from_pretrained(base, "mmrech/pitvqa-qwen2vl-motion")

# Provide two frames for motion analysis
messages = [{"role": "user", "content": [
    {"type": "image", "image": frame1},
    {"type": "image", "image": frame2},
    {"type": "text", "text": "Describe the instrument motion between these frames."}
]}]

Related Models

Downloads last month
60
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mmrech/pitvqa-qwen2vl-motion

Base model

Qwen/Qwen2-VL-2B
Adapter
(117)
this model