Compress to Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents

Yurun Song*, Jiong Yin*, Rongjunchen Zhang, Ian Harris

📖Paper | 💻Code | 🤗Model-3b-3ao | 🤗Model-7b-3ao

CCPO is a state-of-the-art multimodal GUI agent framework that couples visual compression with policy optimization. It introduces Coordinate-Aware Spatial Compression (CASC), which aggregates coordinates from multiple rollouts to capture target-relevant regions and progressively narrow historical attention around key visual areas. We further design a Distance-Based Advantage that provides fine-grained learning signals based on distance rather than binary correctness. Extensive experiments demonstrate that CCPO achieves SOTA performance across four benchmarks with up to 55% token compression and 3.8x training speedup.

� Quick Start

Please refer to our GitHub Repository for the full agent pipeline including the CASC coordinate compression mechanism, as well as the training and evaluation scripts.

citation

If you find this model useful, please cite our paper:

@inproceedings{song2026ccpo,
  title={Compress to Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents},
  author={Yurun Song, Jiong Yin, Rongjunchen Zhang and Ian Harris},
  booktitle={xxx},
  year={2026}
}
Downloads last month
16
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support