Qwen2.5-VL-7B - Browser Agent (RL-Refined)

This model is an advanced visual web browser agent that has been refined using Group Relative Policy Optimization (GRPO). It builds upon the SFT baseline adapter, aggressively optimizing for reasoning depth and formatting compliance.

πŸ”¬ Methodology

  • Hardware: Trained natively in 16-bit BFloat16 on an AMD MI300X (192GB VRAM).
  • Framework Overrides: We bypassed standard VLM compilation bugs in Unsloth by executing a pure PyTorch eager-mode pass, disabling bitsandbytes quantization, and stripping forced gradient checkpointing to resolve ROCm autograd graph breaks.
  • Parallelism: Scaled to 8 parallel trajectory generations per step.
  • Rewards: Format adherence, JSON parsability, and task validity.

πŸš€ Performance

Compared to the SFT baseline, this RL-refined model exhibits significantly superior adherence to negative constraints and complex multi-step instructions. It explicitly uses <think> tags to reason about the visual state of the screen before executing strict JSON actions.

πŸ’» Usage

from peft import PeftModel
from transformers import Qwen2_5_VLForConditionalGeneration

# Note: This is a merged adapter containing BOTH the SFT base and the RL refinements
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto")
model = PeftModel.from_pretrained(model, "ihaveadog/qwen25-vl-7b-browser-agent-v6-rl")
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ihaveadog/qwen25-vl-7b-browser-agent-v6-rl

Adapter
(261)
this model