Add metadata, paper/code links, and quick start instructions

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +43 -0
README.md CHANGED
@@ -1,12 +1,39 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
 
4
  # MAI-UI: Real-World Centric Foundation GUI Agents.
 
 
 
5
  ![overview](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/_ibfeHy_31ZRanQ3xxlnn.png)
6
 
7
  ## 📖 Background
8
  The development of GUI agents could revolutionize the next generation of human-computer interaction. Motivated by this vision, we present MAI-UI, a family of foundation GUI agents spanning the full spectrum of sizes, including 2B, 8B, 32B, and 235B-A22B variants. We identify four key challenges to realistic deployment: the lack of native agent–user interaction, the limits of UI-only operation, the absence of a practical deployment architecture, and brittleness in dynamic environments. MAI-UI addresses these issues with a unified methodology: a self-evolving data pipeline that expands the navigation data to include user interaction and MCP tool calls, a native device–cloud collaboration system that routes execution by task state, and an online RL framework with advanced optimizations to scale parallel environments and context length.
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ## 🏆 Results
11
 
12
  ### Grounding
@@ -31,3 +58,19 @@ MAI-UI establishes new state-of-the-art across GUI grounding and mobile navigati
31
  ### Device-Cloud Collaboration
32
  - Our device-cloud collaboration framework can dynamically select on-device or cloud execution based on task execution state and data sensitivity. It improves on-device performance by 33% and reduces cloud API calls by over 40%.
33
  ![dcc](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/Fm2PPxbRpASfdvVBxkjLw.jpeg)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: image-text-to-text
5
  ---
6
+
7
  # MAI-UI: Real-World Centric Foundation GUI Agents.
8
+
9
+ [[📄 Paper](https://arxiv.org/abs/2512.22047)] [[🌐 Website](https://tongyi-mai.github.io/MAI-UI/)] [[💻 GitHub](https://github.com/Tongyi-MAI/MAI-UI)]
10
+
11
  ![overview](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/_ibfeHy_31ZRanQ3xxlnn.png)
12
 
13
  ## 📖 Background
14
  The development of GUI agents could revolutionize the next generation of human-computer interaction. Motivated by this vision, we present MAI-UI, a family of foundation GUI agents spanning the full spectrum of sizes, including 2B, 8B, 32B, and 235B-A22B variants. We identify four key challenges to realistic deployment: the lack of native agent–user interaction, the limits of UI-only operation, the absence of a practical deployment architecture, and brittleness in dynamic environments. MAI-UI addresses these issues with a unified methodology: a self-evolving data pipeline that expands the navigation data to include user interaction and MCP tool calls, a native device–cloud collaboration system that routes execution by task state, and an online RL framework with advanced optimizations to scale parallel environments and context length.
15
 
16
+ ## 🚀 Quick Start
17
+
18
+ ### Deployment with vLLM
19
+ You can deploy the model using vLLM (requires `vllm>=0.11.0` and `transformers>=4.57.0`):
20
+
21
+ ```bash
22
+ # Install vLLM
23
+ pip install vllm
24
+
25
+ # Start vLLM API server
26
+ python -m vllm.entrypoints.openai.api_server \
27
+ --model Tongyi-MAI/MAI-UI-8B \
28
+ --served-model-name MAI-UI-8B \
29
+ --host 0.0.0.0 \
30
+ --port 8000 \
31
+ --tensor-parallel-size 1 \
32
+ --trust-remote-code
33
+ ```
34
+
35
+ The model will be served at `http://localhost:8000/v1`.
36
+
37
  ## 🏆 Results
38
 
39
  ### Grounding
 
58
  ### Device-Cloud Collaboration
59
  - Our device-cloud collaboration framework can dynamically select on-device or cloud execution based on task execution state and data sensitivity. It improves on-device performance by 33% and reduces cloud API calls by over 40%.
60
  ![dcc](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/Fm2PPxbRpASfdvVBxkjLw.jpeg)
61
+
62
+ ## 📝 Citation
63
+
64
+ If you find this project useful for your research, please consider citing our work:
65
+
66
+ ```bibtex
67
+ @misc{zhou2025maiuitechnicalreportrealworld,
68
+ title={MAI-UI Technical Report: Real-World Centric Foundation GUI Agents},
69
+ author={Hanzhang Zhou and Xu Zhang and Panrong Tong and Jianan Zhang and Liangyu Chen and Quyu Kong and Chenglin Cai and Chen Liu and Yue Wang and Jingren Zhou and Steven Hoi},
70
+ year={2025},
71
+ eprint={2512.22047},
72
+ archivePrefix={arXiv},
73
+ primaryClass={cs.CV},
74
+ url={https://arxiv.org/abs/2512.22047},
75
+ }
76
+ ```