HanXiao1999
/

UI-Genie-Agent-7B

@@ -3,11 +3,17 @@ base_model:
 - Qwen/Qwen2.5-VL-7B-Instruct
 datasets:
 - HanXiao1999/UI-Genie-Agent-5k
 ---
-# UI-Genie-Agent-7B
 ## Model Description
@@ -15,8 +21,6 @@ datasets:
 This model achieves state-of-the-art performance on mobile GUI benchmarks by eliminating the need for manual annotation through synthetic trajectory generation guided by our specialized reward model UI-Genie-RM.
 ## Model Architecture
 - **Base Model**: [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
@@ -53,7 +57,6 @@ Our model is trained on a combination of:
 - [**AndroidLab**](https://github.com/THUDM/Android-Lab): 726 trajectories (high-level tasks)
 - [**UI-Genie-Agent-16k**]((https://huggingface.co/datasets/HanXiao1999/UI-Genie-Agent-5k)): 2.2K synthetic trajectories (our generated data)
 ## Action Space
 The model supports a comprehensive action space for mobile interactions:
@@ -69,7 +72,6 @@ The model supports a comprehensive action space for mobile interactions:
 | `wait` | time, action_desc | Wait operations |
 | `terminate` | status, action_desc | Task completion |
 ## Citation
 ```bibtex
@@ -82,5 +84,4 @@ The model supports a comprehensive action space for mobile interactions:
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2505.21496},
 }
-```

 - Qwen/Qwen2.5-VL-7B-Instruct
 datasets:
 - HanXiao1999/UI-Genie-Agent-5k
+pipeline_tag: image-text-to-text
+library_name: transformers
+license: mit
 ---
+# UI-Genie-Agent-7B
+This model is presented in [UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based
+  Mobile GUI Agents](https://huggingface.co/papers/2505.21496).
+Code: https://github.com/Euphoria16/UI-Genie
 ## Model Description
 This model achieves state-of-the-art performance on mobile GUI benchmarks by eliminating the need for manual annotation through synthetic trajectory generation guided by our specialized reward model UI-Genie-RM.
 ## Model Architecture
 - **Base Model**: [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
 - [**AndroidLab**](https://github.com/THUDM/Android-Lab): 726 trajectories (high-level tasks)
 - [**UI-Genie-Agent-16k**]((https://huggingface.co/datasets/HanXiao1999/UI-Genie-Agent-5k)): 2.2K synthetic trajectories (our generated data)
 ## Action Space
 The model supports a comprehensive action space for mobile interactions:
 | `wait` | time, action_desc | Wait operations |
 | `terminate` | status, action_desc | Task completion |
 ## Citation
 ```bibtex
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2505.21496},
 }
+```