mariocde
/

Qwen3-4B-GRPO-Browser-Agent

Model card Files Files and versions

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen3-4B GRPO Browser Agent

GRPO-trained Qwen3-4B for on-device browser agent (TinyBrowser/Wiegand).

Training

GRPO: 20 iterations, group_size=8, reward 5.12→7.25, JSON validity 95%→100%
Duration: 83 minutes on Tinker GPU cluster
Rewards: valid_json(3x), correct_action(2x), element_exists(1.5x), task_progress(1x), length(0.5x)

Files

qwen3-4b-grpo-q4_0.gguf — Q4_0 GGUF for llama.cpp
adapter_model.safetensors — LoRA adapter weights
metrics.jsonl — Training metrics per iteration

Downloads last month: 42

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

Log In to add your hardware

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support