Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,34 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
base_model:
|
| 4 |
+
- mPLUG/GUI-Owl-7B
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
This model is convert by mlx_vlm from [mPLUG/GUI-Owl-7B](https://huggingface.co/mPLUG/GUI-Owl-7B/edit/main/README.md)
|
| 8 |
+
|
| 9 |
+
## Model Description
|
| 10 |
+
GUI-Owl is a model series developed as part of the Mobile-Agent-V3 project. It achieves state-of-the-art performance across a range of GUI automation benchmarks, including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld.
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
### ScreenSpot-V2, ScreenSpot-Pro and OSWorld-G
|
| 14 |
+
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/screenspot_v2.jpg?raw=true" width="80%"/>
|
| 15 |
+
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/screenspot_pro.jpg?raw=true" width="80%"/>
|
| 16 |
+
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/osworld_g.jpg?raw=true" width="80%"/>
|
| 17 |
+
|
| 18 |
+
### MMBench-GUI L1, L2 and Android Control
|
| 19 |
+
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/mmbench_gui_l1.jpg?raw=true" width="80%"/>
|
| 20 |
+
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/mmbench_gui_l2.jpg?raw=true" width="80%"/>
|
| 21 |
+
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/android_control.jpg?raw=true" width="60%"/>
|
| 22 |
+
|
| 23 |
+
### Android World and OSWorld-Verified
|
| 24 |
+
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/online.jpg?raw=true" width="60%"/>
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
## Quick Start
|
| 28 |
+
```shell
|
| 29 |
+
mlx_vlm.generate --model mlx-community/GUI-Owl-7B-4bit \
|
| 30 |
+
--max-tokens 1024 \
|
| 31 |
+
--temperature 0.0 \
|
| 32 |
+
--prompt "List all contacts’ names and their corresponding grounding boxes([x1, y1, x2, y2]) from the left sidebar of the IM chat interface, return the results in JSON format." \
|
| 33 |
+
--image https://wechat.qpic.cn/uploads/2016/05/WeChat-Windows-2.11.jpg
|
| 34 |
+
```
|