Instructions to use dmis-lab/Qwen3-VL-8B-Instruct-MRPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dmis-lab/Qwen3-VL-8B-Instruct-MRPO with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="dmis-lab/Qwen3-VL-8B-Instruct-MRPO")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("dmis-lab/Qwen3-VL-8B-Instruct-MRPO")
model = AutoModelForMultimodalLM.from_pretrained("dmis-lab/Qwen3-VL-8B-Instruct-MRPO")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use dmis-lab/Qwen3-VL-8B-Instruct-MRPO with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dmis-lab/Qwen3-VL-8B-Instruct-MRPO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dmis-lab/Qwen3-VL-8B-Instruct-MRPO",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/dmis-lab/Qwen3-VL-8B-Instruct-MRPO

SGLang

How to use dmis-lab/Qwen3-VL-8B-Instruct-MRPO with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dmis-lab/Qwen3-VL-8B-Instruct-MRPO" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dmis-lab/Qwen3-VL-8B-Instruct-MRPO",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dmis-lab/Qwen3-VL-8B-Instruct-MRPO" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dmis-lab/Qwen3-VL-8B-Instruct-MRPO",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use dmis-lab/Qwen3-VL-8B-Instruct-MRPO with Docker Model Runner:
```
docker model run hf.co/dmis-lab/Qwen3-VL-8B-Instruct-MRPO
```

Qwen3-VL-8B-Instruct-MRPO

MRPO is a novel reinforcement learning framework that improves medical multimodal reasoning by directly addressing failures in the reasoning process. It reshapes GRPO-style advantages using both answer-level and step-wise process rewards, assigning exponentially larger penalties to earlier invalid steps when the final answer is incorrect, thereby correcting early-stage failures before they cascade while preserving successful trajectories. By redistributing the learning signal according to where reasoning first fails, MRPO induces transferable reasoning that improves both reasoning quality and final answer accuracy across diverse medical VQA benchmarks.

Code: github

Project Page: page

Paper: Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning

Quick Start

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from PIL import Image
import torch

# Load the model (MRPO Qwen3-VL checkpoint; or a local trained checkpoint path)
model_path = "dmis-lab/Qwen3-VL-8B-Instruct-MRPO"
model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_path)

# Example usage (no system prompt; Qwen3 uses <thinking> tags for reasoning)
image_path = "path/to/medical/image.jpg"
question = "What can you see in this medical image?"

question_text = (
    f"{question} Think step-by-step and enclose your reasoning in "
    "<thinking>...</thinking> tags. Then provide your answer in <answer>...</answer> tags."
)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image_path},
            {"type": "text", "text": question_text},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(
    text=[text],
    images=[Image.open(image_path)],
    padding=True,
    padding_side="left",
    return_tensors="pt",
)
inputs = inputs.to(model.device)

# Inference (greedy decoding, matching inference.py)
generated_ids = model.generate(**inputs, use_cache=True, max_new_tokens=512, do_sample=False)
generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

Citation

@misc{jung2026breakingfailurecascadesstepaware,
      title={Breaking Failure Cascades: Step-Aware Reinforcement Learning for Medical Multimodal Reasoning}, 
      author={Junha Jung and Minbyul Jeong and Suhyeon Lim and Sungwook Jung and Jaehoon Yun and Taeyun Roh and Mujeen Sung and Jaewoo Kang},
      year={2026},
      eprint={2606.31825},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2606.31825}, 
}