--- base_model: Qwen/Qwen2-VL-7B-Instruct library_name: transformers license: apache-2.0 language: - en pipeline_tag: image-text-to-text --- # UIPro: Unleashing Superior Interaction Capability For GUI Agents

[\[💻Code\]](https://github.com/ZJULiHongxin/UIPro) [\[🚀Quick Start\]](#uses) [\[📝Paper\]](https://arxiv.org/abs/2509.17328)

![uipro_github_banner](https://cdn-uploads.huggingface.co/production/uploads/648e5a70df53671f33e94d52/VmLuH_usPK5hZOPPnYFhS.png) ## Model Details ![uipro_mainfigure](https://cdn-uploads.huggingface.co/production/uploads/648e5a70df53671f33e94d52/Kd5yOvqpFzoRlqEEL4KAS.png) ### Model Description - **Developed by:** Brave Group, CASIA - **Model type:** Vision-Language Model - **Language(s) (NLP):** English - **License:** Apache License 2.0 - **Finetuned from model:** Qwen2-VL-7B-Instruct ### Model Sources **HongxinLi/UIPro-7B_Stage2_Mobile** is a GUI agentic model finetuned from Qwen2-VL-7B-Instruct. This model is the mobile-oriented embodiment of UIPro and capable of solving GUI agent tasks on mobile scenarios. - **Repository:** [https://github.com/ZJULiHongxin/UIPro](https://github.com/ZJULiHongxin/UIPro) - **Paper:** [https://arxiv.org/abs/2509.17328](https://arxiv.org/abs/2509.17328) ## Uses ### Direct Use First, ensure that the necessary dependencies are installed: ``` pip install transformers pip install qwen-vl-utils ``` Inference code example: ``` from transformers import Qwen2VLForConditionalGeneration, AutoProcessor from qwen_vl_utils import process_vision_info # Default: Load the model on the available device(s) model = Qwen2VLForConditionalGeneration.from_pretrained( "HongxinLi/UIPro-7B_Stage2_Mobile", torch_dtype="auto", device_map="auto" ) processor = AutoProcessor.from_pretrained("HongxinLi/UIPro-7B_Stage2_Mobile") messages = [ { "role": "user", "content": [ { "type": "image", "image": "./web_6f93090a-81f6-489e-bb35-1a2838b18c01.png", }, {"type": "text", "text": """Given the Mobile UI screenshot and previous actions, please generate the next move necessary to advance towards task completion. The user's task is: {task} Action history: {action_history} Now, first describe the action intent and then directly plan the next action."""}, ], } ] ``` ## Citation **BibTeX:** ``` @InProceedings{Li_2025_ICCV, author = {Li, Hongxin and Su, Jingran and Chen, Jingfan and Ju, Zheng and Chen, Yuntao and Li, Qing and Zhang, Zhaoxiang}, title = {UIPro: Unleashing Superior Interaction Capability For GUI Agents}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {1613-1623} } ```