metadata
language:
- en
license: mit
pipeline_tag: image-text-to-text
library_name: transformers
Mobile-Agent-v3.5 (GUI-Owl-1.5)
Mobile-Agent-v3.5 (also known as GUI-Owl-1.5) is a family of native multi-platform GUI agent foundation models. Built on the Qwen3-VL architecture, it is designed for native automation across various platforms including desktop (Windows/macOS), mobile (Android), and web browsers.
- Paper: Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents
- Repository: https://github.com/X-PLUG/MobileAgent
- Demo: ModelScope Online Demo
Key Features
- Multi-platform GUI Automation: Unifies perception, grounding, and action execution across mobile, desktop, and web environments.
- Enhanced Agent Capabilities: Improved reasoning through a unified thought-synthesis pipeline, with a specific focus on Tool/MCP use and long-horizon memory.
- State-of-the-Art Performance: Achieves leading results on benchmarks such as OSWorld, AndroidWorld, and WebArena among open-source models.
Citation
If you find this model useful, please cite the paper:
@article{MobileAgentv3.5,
title={Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents},
author={Haiyang Xu, Xi Zhang, Haowei Liu, Junyang Wang, Zhaozai Zhu, Shengjie Zhou, Xuhao Hu, Feiyu Gao, Junjie Cao, Zihua Wang, Zhiyuan Chen, Jitong Liao, Qi Zheng, Jiahui Zeng, Ze Xu, Shuai Bai, Junyang Lin, Jingren Zhou, Ming Yan},
journal={arXiv preprint arXiv:2602.16855},
year={2026}
}