File size: 1,432 Bytes
11176d7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---
license: apache-2.0
base_model:
- mPLUG/GUI-Owl-7B
---

This model is convert by mlx_vlm from [mPLUG/GUI-Owl-7B](https://huggingface.co/mPLUG/GUI-Owl-7B/edit/main/README.md)

## Model Description
GUI-Owl is a model series developed as part of the Mobile-Agent-V3 project. It achieves state-of-the-art performance across a range of GUI automation benchmarks, including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld. 


### ScreenSpot-V2, ScreenSpot-Pro and OSWorld-G
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/screenspot_v2.jpg?raw=true" width="80%"/>
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/screenspot_pro.jpg?raw=true" width="80%"/>
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/osworld_g.jpg?raw=true" width="80%"/>

### Android World and OSWorld-Verified
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/online.jpg?raw=true" width="60%"/>


## Quick Start
```shell
mlx_vlm.generate --model mlx-community/GUI-Owl-7B-4bit \
  --max-tokens 1024 \
  --temperature 0.0 \
  --prompt "List all contacts’ names and their corresponding grounding boxes([x1, y1, x2, y2]) from the left sidebar of the IM chat interface, return the results in JSON format." \
  --image https://wechat.qpic.cn/uploads/2016/05/WeChat-Windows-2.11.jpg
```