Improve model card: Add pipeline tag, library, abstract, and overview visuals
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,8 +1,34 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
-
|
| 6 |
-
This repository contains the efficient GUI grounding model, **UI-S1-7B**, presented in [UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning](https://huggingface.co/papers/2509.11543).
|
| 7 |
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-text-to-text
|
| 4 |
+
library_name: transformers
|
| 5 |
---
|
| 6 |
|
| 7 |
+
# UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning
|
|
|
|
| 8 |
|
| 9 |
+
This repository contains the efficient GUI grounding model, **UI-S1-7B**, presented in the paper [UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning](https://huggingface.co/papers/2509.11543).
|
| 10 |
+
|
| 11 |
+
Project page / Code: [https://github.com/X-PLUG/MobileAgent/tree/main/UI-S1](https://github.com/X-PLUG/MobileAgent/tree/main/UI-S1)
|
| 12 |
+
|
| 13 |
+
## Paper Abstract
|
| 14 |
+
Graphical User Interface (GUI) agents have demonstrated remarkable progress in automating complex user interface interactions through reinforcement learning. However, current approaches face a fundamental dilemma: offline RL enables stable training on pre-collected trajectories, but struggles with multi-step task execution for lack of trajectory-level reward signals; online RL captures these signals through environment interaction, but suffers from sparse rewards and prohibitive deployment costs. To address it, we present Semi-online Reinforcement Learning, a novel paradigm that simulates online RL on offline trajectories. During each rollout process, we preserve the original model output within the multi-turn dialogue, where a Patch Module adaptively recovers the divergence between rollout and expert trajectories. To capture long-term training signals, Semi-online RL introduces discounted future returns into the reward computation and optimizes the policy with weighted step-level and episode-level advantages. We further introduce Semi-Online Performance (SOP), a metric that aligns better with true online performance, serving as a practical and effective proxy for real-world evaluation. Experiments show that ours Semi-online RL achieves SOTA performance among 7B models across four dynamic benchmarks, with significant gains over the base model (e.g., +12.0% on AndroidWorld, +23.8% on AITW), demonstrating significant progress in bridging the gap between offline training efficiency and online multi-turn reasoning.
|
| 15 |
+
|
| 16 |
+
## Overview
|
| 17 |
+
|
| 18 |
+
We present **Semi-online RL**, a novel paradigm that simulates online reinforcement learning using offline trajectories, thereby enabling the efficient training of MLLM-based GUI agents with enhanced multi-turn interaction capabilities.
|
| 19 |
+
|
| 20 |
+
<div align="center">
|
| 21 |
+
<img src="https://github.com/X-PLUG/MobileAgent/raw/main/UI-S1/assets/method_comparison.png" alt="Method Comparison" style="width:80%;">
|
| 22 |
+
</div>
|
| 23 |
+
|
| 24 |
+
Ours **UI-S1-7B** achieves SOTA performance on both semi-online metric (SOP) and online metric (AndroidWorld) among open-source 7B models.
|
| 25 |
+
|
| 26 |
+
<div align="center">
|
| 27 |
+
<img src="https://github.com/X-PLUG/MobileAgent/raw/main/UI-S1/assets/metric.png" alt="Metrics" style="width:80%;">
|
| 28 |
+
</div>
|
| 29 |
+
|
| 30 |
+
## Detailed results
|
| 31 |
+
|
| 32 |
+
<div align="center">
|
| 33 |
+
<img src="https://github.com/X-PLUG/MobileAgent/raw/main/UI-S1/assets/result.png" alt="Results" style="width:80%;">
|
| 34 |
+
</div>
|