gui360-cooperative-lora-rl-step25

Cooperative LoRA after RL (GSPO) training. TSR=20.8%, StepSR=67.9%. Best LoRA result, 94% of full SFT performance with ~142M params.

Base Model

#	Method	TSR	Params
1	Full-param SFT step-250	22.2%	7.6B
2	V15 Cooperative RL step-25	20.8%	~142M
3	PEFT Cooperative r=128 (SVD)	18.6%	~67M
4	PEFT Standard r=128 (SVD)	18.1%	~67M
5	Base model (zero-shot)	2.4%	—

Part of the Cooperative LoRA research for GUI agents.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

this model