hehehahi4
/

CES

Safetensors

Model card Files Files and versions

xet

Community

Add model card and metadata for CES

by nielsr HF Staff - opened Mar 5

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+46

-3

Files changed (1) hide show

README.md +46 -3

README.md CHANGED Viewed

@@ -1,3 +1,46 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: image-text-to-text
+tags:
+- gui-agent
+- reinforcement-learning
+- multi-agent
+- vlm
+---
+# Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation
+This repository provides the models and code for the **CES (Coordinator-Executor-State Tracker)** framework, a multi-agent system designed to handle long-horizon GUI automation tasks.
+- **Paper:** [Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation](https://huggingface.co/papers/2511.22235)
+- **Code:** [Official GitHub Repository](https://github.com/hehehahi4/CES)
+## Introduction
+The CES framework addresses the limitations of single-agent GUI models in long-horizon tasks by decoupling high-level strategic planning from low-level execution. The system consists of three specialized components:
+*   **Coordinator:** Responsible for strategic planning and task decomposition.
+*   **State Tracker:** Manages context compression and information management to maintain task coherence and state awareness.
+*   **Executor:** A low-level model (such as [GUI-R1](https://github.com/ritzz-ai/GUI-R1)) that executes the grounded operations.
+The high-level scheduling modules (Coordinator and State Tracker) are trained using a staged execution-feedback reinforcement learning algorithm. This design makes them generalizable, plug-and-play modules that can be integrated with various Executor models to significantly enhance their planning and state management capabilities.
+## Key Features
+*   **Multi-Agent Decoupling:** Decouples high-level reasoning from execution, resolving responsibility coupling and capability conflicts.
+*   **State Context Compression:** Resolves state unawareness problems in long-horizon tasks through dynamic summarization.
+*   **Staged Execution-Feedback RL:** A training strategy that freezes a pre-trained Executor and uses its reward signals to train the scheduling models.
+## Citation
+If you find this work helpful, please cite the following paper:
+```bibtex
+@article{deng2025training,
+  title={Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation},
+  author={Deng, Zehao and Ju, Tianjie and Wu, Zheng and Zhang, Zhuosheng and Liu, Gongshen},
+  journal={arXiv preprint arXiv:2511.22235},
+  year={2025}
+}
+```