flin775 commited on
Commit
11176d7
·
verified ·
1 Parent(s): b26fa10

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -3
README.md CHANGED
@@ -1,3 +1,34 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - mPLUG/GUI-Owl-7B
5
+ ---
6
+
7
+ This model is convert by mlx_vlm from [mPLUG/GUI-Owl-7B](https://huggingface.co/mPLUG/GUI-Owl-7B/edit/main/README.md)
8
+
9
+ ## Model Description
10
+ GUI-Owl is a model series developed as part of the Mobile-Agent-V3 project. It achieves state-of-the-art performance across a range of GUI automation benchmarks, including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld.
11
+
12
+
13
+ ### ScreenSpot-V2, ScreenSpot-Pro and OSWorld-G
14
+ <img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/screenspot_v2.jpg?raw=true" width="80%"/>
15
+ <img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/screenspot_pro.jpg?raw=true" width="80%"/>
16
+ <img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/osworld_g.jpg?raw=true" width="80%"/>
17
+
18
+ ### MMBench-GUI L1, L2 and Android Control
19
+ <img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/mmbench_gui_l1.jpg?raw=true" width="80%"/>
20
+ <img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/mmbench_gui_l2.jpg?raw=true" width="80%"/>
21
+ <img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/android_control.jpg?raw=true" width="60%"/>
22
+
23
+ ### Android World and OSWorld-Verified
24
+ <img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/online.jpg?raw=true" width="60%"/>
25
+
26
+
27
+ ## Quick Start
28
+ ```shell
29
+ mlx_vlm.generate --model mlx-community/GUI-Owl-7B-4bit \
30
+ --max-tokens 1024 \
31
+ --temperature 0.0 \
32
+ --prompt "List all contacts’ names and their corresponding grounding boxes([x1, y1, x2, y2]) from the left sidebar of the IM chat interface, return the results in JSON format." \
33
+ --image https://wechat.qpic.cn/uploads/2016/05/WeChat-Windows-2.11.jpg
34
+ ```