arxiv:2601.20732

Continual GUI Agents

Published on Jan 28

· Submitted by

Rajkumar rawal on Feb 2

· Tsinghua University

Upvote

Authors:

Abstract

Continual GUI Agents framework addresses performance degradation in dynamic digital environments through reinforcement fine-tuning with novel anchoring rewards that stabilize learning across shifting UI domains and resolutions.

AI-generated summary

As digital environments (data distribution) are in flux, with new GUI data arriving over time-introducing new domains or resolutions-agents trained on static environments deteriorate in performance. In this work, we introduce Continual GUI Agents, a new task that requires GUI agents to perform continual learning under shifted domains and resolutions. We find existing methods fail to maintain stable grounding as GUI distributions shift over time, due to the diversity of UI interaction points and regions in fluxing scenarios. To address this, we introduce GUI-Anchoring in Flux (GUI-AiF), a new reinforcement fine-tuning framework that stabilizes continual learning through two novel rewards: Anchoring Point Reward in Flux (APR-iF) and Anchoring Region Reward in Flux (ARR-iF). These rewards guide the agents to align with shifting interaction points and regions, mitigating the tendency of existing reward strategies to over-adapt to static grounding cues (e.g., fixed coordinates or element scales). Extensive experiments show GUI-AiF surpasses state-of-the-art baselines. Our work establishes the first continual learning framework for GUI agents, revealing the untapped potential of reinforcement fine-tuning for continual GUI Agents.

View arXiv page View PDF Add to collection

Community

rajkumarrawal

Paper submitter about 8 hours ago

Some of the observations founded are :-

-- Static GUI training breaks under real world change :
GUI agents trained on fixed datasets degrade badly when UI domains (mobile --> desktop --> web) or resolutions (1080p -> 4K) shift, mainly due to unstable grounding of interaction points and regions.

-- SFT overfits ; RFT is more suitable for continual learning :
Supervised Fine Tuning memorizes current layouts and forgets prior skills, while Reinforcement Fine Tuning ( RFT ) better preserves knowledge via KL regularized updates making it a stronger base for continual GUI agents.

-- Grounding fails because interaction points and scales are in flux :
Domain and resolution shifts cause large changes in element locations and sizes, and existing reward designs over adapt to static coordinates or scales, leading to poor generalization.

-- GUI AiF stabilizes learning by rewarding diversity, not fixation :
The proposed GUI AiF framework introduces two rewards APR-iF ( diverse interaction points ) and ARR-iF ( diverse element regions ) which prevent agents from collapsing onto single layouts.

-- Generalizing interaction points matters more than scale :
Ablations show APR-iF contributes more than ARR-iF, indicating that adapting to where to interact is more critical than adapting to how big elements are in continual GUI environments.

..... many more...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.20732 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.20732 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.20732 in a Space README.md to link it from this page.