Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model:
|
| 6 |
+
- Qwen/Qwen2.5-VL-32B-Instruct
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# GUI-Owl
|
| 10 |
+
|
| 11 |
+
<div align="center">
|
| 12 |
+
<img src=https://youke1.picui.cn/s1/2025/08/18/68a2f82fef3d4.png width="40%"/>
|
| 13 |
+
</div>
|
| 14 |
+
|
| 15 |
+
GUI-Owl is a model series developed as part of the Mobile-Agent-V3 project. It achieves state-of-the-art performance across a range of GUI automation benchmarks, including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld. Furthermore, it can be instantiated as various specialized agents within the Mobile-Agent-V3 multi-agent framework to accomplish more complex tasks.
|
| 16 |
+
|
| 17 |
+
* **Paper**:
|
| 18 |
+
* **GitHub Repository**: https://github.com/X-PLUG/MobileAgent
|
| 19 |
+
* **Online Demo**: Comming soon
|
| 20 |
+
|
| 21 |
+
## Performance
|
| 22 |
+
|
| 23 |
+
### ScreenSpot-V2, ScreenSpot-Pro and OSWorld-G
|
| 24 |
+
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/screenspot_v2.jpg?raw=true" width="80%"/>
|
| 25 |
+
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/screenspot_pro.jpg?raw=true" width="80%"/>
|
| 26 |
+
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/osworld_g.jpg?raw=true" width="80%"/>
|
| 27 |
+
|
| 28 |
+
### MMBench-GUI L1, L2 and Android Control
|
| 29 |
+
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/mmbench_gui_l1.jpg?raw=true" width="80%"/>
|
| 30 |
+
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/mmbench_gui_l2.jpg?raw=true" width="80%"/>
|
| 31 |
+
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/android_control.jpg?raw=true" width="40%"/>
|
| 32 |
+
|
| 33 |
+
### Android World and OSWorld-Verified
|
| 34 |
+
<img src="https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/online.jpg?raw=true" width="40%"/>
|
| 35 |
+
|
| 36 |
+
## Usage
|
| 37 |
+
|
| 38 |
+
Please refer to our cookbook.
|
| 39 |
+
|
| 40 |
+
## Deploy
|
| 41 |
+
|
| 42 |
+
We recommand deploy GUI-Owl-32B through vllm
|
| 43 |
+
|
| 44 |
+
This script has been validated on an A100 with 96 GB of VRAM. If you serve GUI-Owl-32B on an H20-3e, you can set MP_SIZE=1 for faster inference speed.
|
| 45 |
+
```bash
|
| 46 |
+
PIXEL_ARGS='{"min_pixels":3136,"max_pixels":10035200}'
|
| 47 |
+
IMAGE_LIMIT_ARGS='image=2'
|
| 48 |
+
MP_SIZE=2
|
| 49 |
+
MM_KWARGS=(
|
| 50 |
+
--mm-processor-kwargs $PIXEL_ARGS
|
| 51 |
+
--limit-mm-per-prompt $IMAGE_LIMIT_ARGS
|
| 52 |
+
)
|
| 53 |
+
|
| 54 |
+
vllm serve $CKPT \
|
| 55 |
+
--max-model-len 32768 ${MM_KWARGS[@]} \
|
| 56 |
+
--tensor-parallel-size $MP_SIZE \
|
| 57 |
+
--allowed-local-media-path '/' \
|
| 58 |
+
--port 4243
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
If you want GUI-Owl to recieve more than two images, you could increase `IMAGE_LIMIT_ARGS` and reduce `max_pixels`.
|
| 62 |
+
|
| 63 |
+
For example:
|
| 64 |
+
```bash
|
| 65 |
+
PIXEL_ARGS='{"min_pixels":3136,"max_pixels":3211264}'
|
| 66 |
+
IMAGE_LIMIT_ARGS='image=5'
|
| 67 |
+
MP_SIZE=2
|
| 68 |
+
MM_KWARGS=(
|
| 69 |
+
--mm-processor-kwargs $PIXEL_ARGS
|
| 70 |
+
--limit-mm-per-prompt $IMAGE_LIMIT_ARGS
|
| 71 |
+
)
|
| 72 |
+
|
| 73 |
+
vllm serve $CKPT \
|
| 74 |
+
--max-model-len 32768 ${MM_KWARGS[@]} \
|
| 75 |
+
--tensor-parallel-size $MP_SIZE \
|
| 76 |
+
--allowed-local-media-path '/' \
|
| 77 |
+
--port 4243
|
| 78 |
+
```
|