Abstract
UI-Venus-1.5 is a unified GUI agent with improved performance through mid-training stages, online reinforcement learning, and model merging techniques.
GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world applications.The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B) to meet various downstream application scenarios.Compared to our previous version, UI-Venus-1.5 introduces three key technical advances: (1) a comprehensive Mid-Training stage leveraging 10 billion tokens across 30+ datasets to establish foundational GUI semantics; (2) Online Reinforcement Learning with full-trajectory rollouts, aligning training objectives with long-horizon, dynamic navigation in large-scale environments; and (3) a single unified GUI Agent constructed via Model Merging, which synthesizes domain-specific models (grounding, web, and mobile) into one cohesive checkpoint. Extensive evaluations demonstrate that UI-Venus-1.5 establishes new state-of-the-art performance on benchmarks such as ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%), significantly outperforming previous strong baselines. In addition, UI-Venus-1.5 demonstrates robust navigation capabilities across a variety of Chinese mobile apps, effectively executing user instructions in real-world scenarios. Code: https://github.com/inclusionAI/UI-Venus; Model: https://huggingface.co/collections/inclusionAI/ui-venus
Community
Is your GUI Agent ready for real work? 🔥
We’ve seen many great previous GUI Agents, but making a "stable assistant" for phones and websites is still hard. There are three main problems:
1️⃣ Knowledge Gap: AI often misses less common icons and doesn't know how specialized apps work.
2️⃣ The Reality Gap: Models that work well in tests often fail during real-life tasks.
3️⃣ Too Complex: Using multi-agent framework usually costs too much.
Enter UI-Venus-1.5 🚀 — The new high-performance, end-to-end GUI Agent from Ant Group!
Unlike old ways, UI-Venus-1.5 is built for real-world use:
📱 All-in-One: One single model for Grounding, Mobile, and Web tasks.
🇨🇳 Real App Support: Full support for 40+ popular Chinese apps, making AI part of daily life.
⚡ Simple & Fast: A clean, end-to-end design for faster and more reliable work.
Check it out and see how AI can truly help you! 🐜✨
Models citing this paper 3
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper