BigTaige
/

HAR-GUI-3B

Image-Text-to-Text

Screen-Understanding

Model card Files Files and versions

BigTaige commited on Aug 4, 2025

Commit

f9fe443

·

verified ·

1 Parent(s): 56253ee

Update README.md

Files changed (1) hide show

README.md +32 -1

README.md CHANGED Viewed

@@ -1,3 +1,34 @@
 import requests
 import json
 from tqdm import tqdm
@@ -49,7 +80,6 @@ def act2sum_fn(meta_data):
     return pred
 #############################################################################################
 url = "http://localhost:8000/v1/chat/completions"
 headers = {
      "Content-Type": "application/json"
@@ -143,3 +173,4 @@ if __name__ == "__main__":
     # evaluate(inference_data)
     with open("your_saving_path.json", "w") as f:
         f.write(json.dumps(inference_data, indent=4))

+---
+license: mit
+language:
+- en
+- zh
+base_model:
+- Qwen/Qwen2.5-VL-3B-Instruct
+pipeline_tag: image-text-to-text
+tags:
+- GUI-Agent
+- GUI-Perception
+- Screen-Understanding
+---
+## Introduction
+**HAR-GUI-3B** is a GUI-tailored native model (native end-to-end GUI agent) built upon Qwen2.5-VL-3B-Instruct. It was developed through our HAR Framework, incorporating a series of tailored training strategies. HAR-GUI-3B integrates a stable short-term memory for episodic reasoning, which can perceive the sequential clues of the episode flexibly and make reasonable use of it. This enhancement of reasoning can assist the GUI agent in executing long-horizon interaction and achieving consistent and persistent growth across GUI-oriented tasks. Further details can be found in our article.
+## Quick Start
+The following Python script demonstrates how to use the HAR-GUI-3B for GUI automation. This example assumes you have a local vLLM server running the model. You can adapt the code to fit your specific needs.
+```bash
+# Start vllm service
+nohup python -m vllm.entrypoints.openai.api_server --served-model-name Qwen2.5-VL-3B-Instruct --model ./HAR-GUI-3B -tp 4 > log.txt &
+#nohup python -m vllm.entrypoints.openai.api_server --served-model-name Qwen2.5-VL-72B-Instruct --model ./Qwen2.5-VL-72B-Instruct -tp 8 > log.txt &
+# Mount your local directory
+cd ./your_directory/
+python3 -m http.server 6666
+```
+```python
 import requests
 import json
 from tqdm import tqdm
     return pred
 #############################################################################################
 url = "http://localhost:8000/v1/chat/completions"
 headers = {
      "Content-Type": "application/json"
     # evaluate(inference_data)
     with open("your_saving_path.json", "w") as f:
         f.write(json.dumps(inference_data, indent=4))
+```