Qwen2.5-VL-7B - Browser Agent (RL-Refined)
This model is an advanced visual web browser agent that has been refined using Group Relative Policy Optimization (GRPO). It builds upon the SFT baseline adapter, aggressively optimizing for reasoning depth and formatting compliance.
π¬ Methodology
- Hardware: Trained natively in 16-bit BFloat16 on an AMD MI300X (192GB VRAM).
- Framework Overrides: We bypassed standard VLM compilation bugs in Unsloth by executing a pure PyTorch eager-mode pass, disabling
bitsandbytesquantization, and stripping forced gradient checkpointing to resolve ROCm autograd graph breaks. - Parallelism: Scaled to 8 parallel trajectory generations per step.
- Rewards: Format adherence, JSON parsability, and task validity.
π Performance
Compared to the SFT baseline, this RL-refined model exhibits significantly superior adherence to negative constraints and complex multi-step instructions. It explicitly uses <think> tags to reason about the visual state of the screen before executing strict JSON actions.
π» Usage
from peft import PeftModel
from transformers import Qwen2_5_VLForConditionalGeneration
# Note: This is a merged adapter containing BOTH the SFT base and the RL refinements
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct", device_map="auto")
model = PeftModel.from_pretrained(model, "ihaveadog/qwen25-vl-7b-browser-agent-v6-rl")
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for ihaveadog/qwen25-vl-7b-browser-agent-v6-rl
Base model
Qwen/Qwen2.5-VL-7B-Instruct