Papers
arxiv:2603.24533

UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

Published on Mar 25
· Submitted by
taesiri
on Mar 26
#3 Paper of the day
Authors:
,
,
,
,
,
,
,

Abstract

A two-stage self-evolving mobile GUI agent named UI-Voyager is proposed, featuring rejection fine-tuning and group relative self-distillation to improve efficiency and performance in GUI automation tasks.

AI-generated summary

Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI tasks. To that end, we propose UI-Voyager, a novel two-stage self-evolving mobile GUI agent. In the first stage, we employ Rejection Fine-Tuning (RFT), which enables the continuous co-evolution of data and models in a fully autonomous loop. The second stage introduces Group Relative Self-Distillation (GRSD), which identifies critical fork points in group rollouts and constructs dense step-level supervision from successful trajectories to correct failed ones. Extensive experiments on AndroidWorld show that our 4B model achieves an 81.0% Pass@1 success rate, outperforming numerous recent baselines and exceeding human-level performance. Ablation and case studies further verify the effectiveness of GRSD. Our method represents a significant leap toward efficient, self-evolving, and high-performance mobile GUI automation without expensive manual data annotation.

Community

Paper submitter

the grsd trick is the standout here, turning failed steps into supervision by pinpointing fork points in group rollouts and injecting dense, step-level corrections from successful trajectories.

i'm curious how they define forks in practice—are forks based on action sequences, screen states, or some learned similarity metric, and how granular is the cutoff?

there's a risk that self-distillation could amplify early biases if the model's initial preferences gate what gets labeled as correct.

the two-stage loop with rejection fine-tuning plus grsd seems practical for real apps, and the 81.0% pass@1 on AndroidWorld for a 4B model is pretty solid.

btw the arxivlens breakdown helped me parse the method details, here’s a solid walkthrough that covers ui-voyager and this two-stage loop: https://arxivlens.com/PaperView/Details/ui-voyager-a-self-evolving-gui-agent-learning-via-failed-experience-7726-c7de3304

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2603.24533
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.24533 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.24533 in a Space README.md to link it from this page.

Collections including this paper 1