Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import requests
|
| 2 |
import json
|
| 3 |
from tqdm import tqdm
|
|
@@ -49,7 +80,6 @@ def act2sum_fn(meta_data):
|
|
| 49 |
return pred
|
| 50 |
#############################################################################################
|
| 51 |
|
| 52 |
-
|
| 53 |
url = "http://localhost:8000/v1/chat/completions"
|
| 54 |
headers = {
|
| 55 |
"Content-Type": "application/json"
|
|
@@ -143,3 +173,4 @@ if __name__ == "__main__":
|
|
| 143 |
# evaluate(inference_data)
|
| 144 |
with open("your_saving_path.json", "w") as f:
|
| 145 |
f.write(json.dumps(inference_data, indent=4))
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
- zh
|
| 6 |
+
base_model:
|
| 7 |
+
- Qwen/Qwen2.5-VL-3B-Instruct
|
| 8 |
+
pipeline_tag: image-text-to-text
|
| 9 |
+
tags:
|
| 10 |
+
- GUI-Agent
|
| 11 |
+
- GUI-Perception
|
| 12 |
+
- Screen-Understanding
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
## Introduction
|
| 16 |
+
|
| 17 |
+
**HAR-GUI-3B** is a GUI-tailored native model (native end-to-end GUI agent) built upon Qwen2.5-VL-3B-Instruct. It was developed through our HAR Framework, incorporating a series of tailored training strategies. HAR-GUI-3B integrates a stable short-term memory for episodic reasoning, which can perceive the sequential clues of the episode flexibly and make reasonable use of it. This enhancement of reasoning can assist the GUI agent in executing long-horizon interaction and achieving consistent and persistent growth across GUI-oriented tasks. Further details can be found in our article.
|
| 18 |
+
|
| 19 |
+
## Quick Start
|
| 20 |
+
The following Python script demonstrates how to use the HAR-GUI-3B for GUI automation. This example assumes you have a local vLLM server running the model. You can adapt the code to fit your specific needs.
|
| 21 |
+
```bash
|
| 22 |
+
# Start vllm service
|
| 23 |
+
nohup python -m vllm.entrypoints.openai.api_server --served-model-name Qwen2.5-VL-3B-Instruct --model ./HAR-GUI-3B -tp 4 > log.txt &
|
| 24 |
+
#nohup python -m vllm.entrypoints.openai.api_server --served-model-name Qwen2.5-VL-72B-Instruct --model ./Qwen2.5-VL-72B-Instruct -tp 8 > log.txt &
|
| 25 |
+
|
| 26 |
+
# Mount your local directory
|
| 27 |
+
cd ./your_directory/
|
| 28 |
+
python3 -m http.server 6666
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
```python
|
| 32 |
import requests
|
| 33 |
import json
|
| 34 |
from tqdm import tqdm
|
|
|
|
| 80 |
return pred
|
| 81 |
#############################################################################################
|
| 82 |
|
|
|
|
| 83 |
url = "http://localhost:8000/v1/chat/completions"
|
| 84 |
headers = {
|
| 85 |
"Content-Type": "application/json"
|
|
|
|
| 173 |
# evaluate(inference_data)
|
| 174 |
with open("your_saving_path.json", "w") as f:
|
| 175 |
f.write(json.dumps(inference_data, indent=4))
|
| 176 |
+
```
|