gui360-cooperative-lora-rl-step25

Cooperative LoRA after RL (GSPO) training. TSR=20.8%, StepSR=67.9%. Best LoRA result, 94% of full SFT performance with ~142M params.

Base Model

  • Qwen2.5-VL-7B-Instruct

Training Data

  • GUI-360 balanced 2K episodes (17,264 steps)
  • Action types: click, type, swipe (balanced sampling)

Evaluation (GUI-360 test 1K balanced)

Metric Value
TSR (Task Success Rate) 20.8%
StepSR (Step Success Rate) 67.9%
Progress 34.5%

Full Ranking

# Method TSR Params
1 Full-param SFT step-250 22.2% 7.6B
2 V15 Cooperative RL step-25 20.8% ~142M
3 PEFT Cooperative r=128 (SVD) 18.6% ~67M
4 PEFT Standard r=128 (SVD) 18.1% ~67M
5 Base model (zero-shot) 2.4% —

Citation

Part of the Cooperative LoRA research for GUI agents.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Stevenshuqing/gui360-cooperative-lora-rl-step25

Finetuned
(1080)
this model

Dataset used to train Stevenshuqing/gui360-cooperative-lora-rl-step25