nielsr's picture
nielsr HF Staff
Improve model card: add pipeline tag, library name, and links
793cc38 verified
|
raw
history blame
2.09 kB
metadata
language:
  - en
license: mit
pipeline_tag: image-text-to-text
library_name: transformers

Mobile-Agent-v3.5 (GUI-Owl-1.5)

Mobile-Agent-v3.5, also referred to as GUI-Owl-1.5, is a family of native multi-platform GUI agent foundation models. It is designed to enable cloud-edge collaboration and real-time interaction across a range of platforms, including desktop (Windows, Linux, MacOS), mobile (Android), and browsers.

Model Description

GUI-Owl-1.5 is a native end-to-end multimodal agent that unifies perception, grounding, reasoning, planning, and action execution within a single policy network. It achieves state-of-the-art results on more than 20 GUI benchmarks, including OSWorld, AndroidWorld, and WebArena.

Key features include:

  • Multi-platform support: Unified automation capabilities for mobile, desktop, and web environments.
  • Thinking & Instruct Variants: Features models optimized for both direct instruction following and complex reasoning.
  • Unified Reasoning: Uses a thought-synthesis pipeline to enhance decision-making and tool-calling (MCP) capabilities.
  • MRPO Algorithm: A new environment RL algorithm designed to address multi-platform conflicts and improve training efficiency for long-horizon tasks.

Resources

Citation

If you find this model useful, please cite our paper:

@article{MobileAgentv3.5,
  title={Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents},
  author={Haiyang Xu, Xi Zhang, Haowei Liu, Junyang Wang, Zhaozai Zhu, Shengjie Zhou, Xuhao Hu, Feiyu Gao, Junjie Cao, Zihua Wang, Zhiyuan Chen, Jitong Liao, Qi Zheng, Jiahui Zeng, Ze Xu, Shuai Bai, Junyang Lin, Jingren Zhou, Ming Yan},
  journal={arXiv preprint arXiv:2602.16855},
  year={2026}
}