|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- mPLUG/GUI-Owl-7B |
|
|
--- |
|
|
|
|
|
This model is convert by mlx_vlm from [mPLUG/GUI-Owl-7B](https://huggingface.co/mPLUG/GUI-Owl-7B/edit/main/README.md) |
|
|
|
|
|
## Model Description |
|
|
GUI-Owl is a model series developed as part of the Mobile-Agent-V3 project. It achieves state-of-the-art performance across a range of GUI automation benchmarks, including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld. |
|
|
|
|
|
|
|
|
### ScreenSpot-V2, ScreenSpot-Pro and OSWorld-G |
|
|
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/screenspot_v2.jpg?raw=true" width="80%"/> |
|
|
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/screenspot_pro.jpg?raw=true" width="80%"/> |
|
|
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/osworld_g.jpg?raw=true" width="80%"/> |
|
|
|
|
|
### Android World and OSWorld-Verified |
|
|
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/online.jpg?raw=true" width="60%"/> |
|
|
|
|
|
|
|
|
## Quick Start |
|
|
```shell |
|
|
mlx_vlm.generate --model mlx-community/GUI-Owl-7B-4bit \ |
|
|
--max-tokens 1024 \ |
|
|
--temperature 0.0 \ |
|
|
--prompt "List all contacts’ names and their corresponding grounding boxes([x1, y1, x2, y2]) from the left sidebar of the IM chat interface, return the results in JSON format." \ |
|
|
--image https://wechat.qpic.cn/uploads/2016/05/WeChat-Windows-2.11.jpg |
|
|
``` |
|
|
|