Update README.md
Browse files
README.md
CHANGED
|
@@ -12,10 +12,20 @@ UI-TARS-1.5 is ByteDance's open-source multimodal agent built upon a powerful vi
|
|
| 12 |
|
| 13 |
The released UI-TARS-1.5-7B focuses primarily on enhancing general computer use capabilities and is not specifically optimized for game-based scenarios, where the UI-TARS-1.5 still holds a significant advantage.
|
| 14 |
|
| 15 |
-
Here are the performance of UI-TARS-1.5-7B and UI-TARS-1.5 on OSWorld and ScreenSpotProd.
|
| 16 |
-
|
| 17 |
| **Benchmark Type** | **Benchmark** | **UI-TARS-1.5-7B** | **UI-TARS-1.5** |
|
| 18 |
|--------------------|------------------------------------|--------------------|-----------------|
|
| 19 |
| Computer Use | [OSWorld](https://arxiv.org/abs/2404.07972) | 27.5 | **42.5** |
|
| 20 |
| GUI Grounding | [ScreenSpotPro](https://arxiv.org/pdf/2504.07981v1) | 49.6 | **61.6** |
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
The released UI-TARS-1.5-7B focuses primarily on enhancing general computer use capabilities and is not specifically optimized for game-based scenarios, where the UI-TARS-1.5 still holds a significant advantage.
|
| 14 |
|
|
|
|
|
|
|
| 15 |
| **Benchmark Type** | **Benchmark** | **UI-TARS-1.5-7B** | **UI-TARS-1.5** |
|
| 16 |
|--------------------|------------------------------------|--------------------|-----------------|
|
| 17 |
| Computer Use | [OSWorld](https://arxiv.org/abs/2404.07972) | 27.5 | **42.5** |
|
| 18 |
| GUI Grounding | [ScreenSpotPro](https://arxiv.org/pdf/2504.07981v1) | 49.6 | **61.6** |
|
| 19 |
|
| 20 |
+
P.S. This is the performance of UI-TARS-1.5-7B and UI-TARS-1.5 on OSWorld and ScreenSpotProd.
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
## Quick Start
|
| 24 |
+
```shell
|
| 25 |
+
mlx_vlm.generate --model flin775/UI-Tars-1.5-7B-4bit-mlx \
|
| 26 |
+
--max-tokens 1024 \
|
| 27 |
+
--temperature 0.0 \
|
| 28 |
+
--prompt "List all contacts’ names and their corresponding grounding boxes([x1, y1, x2, y2]) from the left sidebar of the IM chat interface, return the results in JSON format." \
|
| 29 |
+
--image https://wechat.qpic.cn/uploads/2016/05/WeChat-Windows-2.11.jpg
|
| 30 |
+
```
|
| 31 |
+
|