Image-Text-to-Text
PEFT
Safetensors
medical
vision-language
surgical-ai
pituitary-surgery
motion-detection
temporal
conversational
Instructions to use mmrech/pitvqa-qwen2vl-motion with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use mmrech/pitvqa-qwen2vl-motion with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-VL-2B-Instruct") model = PeftModel.from_pretrained(base_model, "mmrech/pitvqa-qwen2vl-motion") - Notebooks
- Google Colab
- Kaggle
PitVQA Motion Model
Specialized model for temporal motion detection between surgical video frames.
Description
This model analyzes pairs of frames to detect instrument motion. It outputs structured motion annotations:
<motion instrument="suction" direction="left" magnitude="moderate"/>
Training
- Base: Qwen/Qwen2-VL-2B-Instruct
- Method: SFT with LoRA
- Dataset: Motion annotations from PitVis-2023
Usage
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
base = Qwen2VLForConditionalGeneration.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", ...)
model = PeftModel.from_pretrained(base, "mmrech/pitvqa-qwen2vl-motion")
# Provide two frames for motion analysis
messages = [{"role": "user", "content": [
{"type": "image", "image": frame1},
{"type": "image", "image": frame2},
{"type": "text", "text": "Describe the instrument motion between these frames."}
]}]
Related Models
- pitvqa-qwen2vl-unified-v2 - Multi-task including motion (Stage 3)
- Downloads last month
- 3