Qwen2.5-VL-7B - Browser Agent (RL-Refined)

This model is an advanced visual web browser agent that has been refined using Group Relative Policy Optimization (GRPO). It builds upon the SFT baseline adapter, aggressively optimizing for reasoning depth and formatting compliance.

🔬 Methodology

Hardware: Trained natively in 16-bit BFloat16 on an AMD MI300X (192GB VRAM).
Framework Overrides: We bypassed standard VLM compilation bugs in Unsloth by executing a pure PyTorch eager-mode pass, disabling bitsandbytes quantization, and stripping forced gradient checkpointing to resolve ROCm autograd graph breaks.
Parallelism: Scaled to 8 parallel trajectory generations per step.
Rewards: Format adherence, JSON parsability, and task validity.

🚀 Performance

Compared to the SFT baseline, this RL-refined model exhibits significantly superior adherence to negative constraints and complex multi-step instructions. It explicitly uses <think> tags to reason about the visual state of the screen before executing strict JSON actions.

💻 Usage

from peft import PeftModel
from transformers import Qwen2_5_VLForConditionalGeneration

# Note: This is a merged adapter containing BOTH the SFT base and the RL refinements
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto")
model = PeftModel.from_pretrained(model, "ihaveadog/qwen25-vl-7b-browser-agent-v6-rl")

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ihaveadog/qwen25-vl-7b-browser-agent-v6-rl

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Adapter

(261)

this model