Add pipeline tag and library name
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,23 +1,22 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
| 3 |
datasets:
|
| 4 |
- yolay/SmartSnap-FT
|
| 5 |
- yolay/SmartSnap-RL
|
| 6 |
language:
|
| 7 |
- en
|
|
|
|
| 8 |
metrics:
|
| 9 |
- accuracy
|
| 10 |
-
|
| 11 |
-
|
| 12 |
tags:
|
| 13 |
- agent
|
| 14 |
- mobile
|
| 15 |
- gui
|
| 16 |
---
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
<div align="center">
|
| 22 |
<img src="https://raw.githubusercontent.com/yuleiqin/images/master/SmartSnap/mascot_smartsnap.png" width="400"/>
|
| 23 |
</div>
|
|
@@ -28,7 +27,6 @@ tags:
|
|
| 28 |
|
| 29 |
</p>
|
| 30 |
|
| 31 |
-
|
| 32 |
We introduce **SmartSnap**, a paradigm shift that transforms GUI agents📱💻🤖 from passive task executors into proactive self-verifiers. By empowering agents to curate their own evidence of success through the **3C Principles** (Completeness, Conciseness, Creativity), we eliminate the bottleneck of expensive post-hoc verification while boosting reliability and performance on complex mobile tasks.
|
| 33 |
|
| 34 |
# 📖 Overview
|
|
@@ -116,13 +114,10 @@ We release the following resources to accelerate research in self-verifying agen
|
|
| 116 |
| **FT (ours)** | Qwen3-32B-Instruct | 28.98<sup>(+10.86%)</sup> | 35.92 | 97.79 | 97.33 |
|
| 117 |
| **RL (ours)** | Qwen3-32B-Instruct | <u>34.78</u><sup>(+16.66%)</sup> | 40.26 | 89.47 | 93.67 |
|
| 118 |
|
| 119 |
-
|
| 120 |
-
|
| 121 |
*<sup>*</sup> LLaMA3.1 models only natively support tool calling w/o reasoning.*
|
| 122 |
*<sup>†</sup> The Android Instruct dataset is used for fine-tuning where self-verification is not performed.*
|
| 123 |
*<sup>‡</sup> The official results are cited here for comparison.*
|
| 124 |
|
| 125 |
-
|
| 126 |
---
|
| 127 |
|
| 128 |
- **Performance gains**: All model families achieve >16% improvement over prompting baselines, reaching competitive performance with models 10-30× larger.
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- meta-llama/Llama-3.1-8B-Instruct
|
| 4 |
datasets:
|
| 5 |
- yolay/SmartSnap-FT
|
| 6 |
- yolay/SmartSnap-RL
|
| 7 |
language:
|
| 8 |
- en
|
| 9 |
+
license: apache-2.0
|
| 10 |
metrics:
|
| 11 |
- accuracy
|
| 12 |
+
pipeline_tag: image-text-to-text
|
| 13 |
+
library_name: transformers
|
| 14 |
tags:
|
| 15 |
- agent
|
| 16 |
- mobile
|
| 17 |
- gui
|
| 18 |
---
|
| 19 |
|
|
|
|
|
|
|
|
|
|
| 20 |
<div align="center">
|
| 21 |
<img src="https://raw.githubusercontent.com/yuleiqin/images/master/SmartSnap/mascot_smartsnap.png" width="400"/>
|
| 22 |
</div>
|
|
|
|
| 27 |
|
| 28 |
</p>
|
| 29 |
|
|
|
|
| 30 |
We introduce **SmartSnap**, a paradigm shift that transforms GUI agents📱💻🤖 from passive task executors into proactive self-verifiers. By empowering agents to curate their own evidence of success through the **3C Principles** (Completeness, Conciseness, Creativity), we eliminate the bottleneck of expensive post-hoc verification while boosting reliability and performance on complex mobile tasks.
|
| 31 |
|
| 32 |
# 📖 Overview
|
|
|
|
| 114 |
| **FT (ours)** | Qwen3-32B-Instruct | 28.98<sup>(+10.86%)</sup> | 35.92 | 97.79 | 97.33 |
|
| 115 |
| **RL (ours)** | Qwen3-32B-Instruct | <u>34.78</u><sup>(+16.66%)</sup> | 40.26 | 89.47 | 93.67 |
|
| 116 |
|
|
|
|
|
|
|
| 117 |
*<sup>*</sup> LLaMA3.1 models only natively support tool calling w/o reasoning.*
|
| 118 |
*<sup>†</sup> The Android Instruct dataset is used for fine-tuning where self-verification is not performed.*
|
| 119 |
*<sup>‡</sup> The official results are cited here for comparison.*
|
| 120 |
|
|
|
|
| 121 |
---
|
| 122 |
|
| 123 |
- **Performance gains**: All model families achieve >16% improvement over prompting baselines, reaching competitive performance with models 10-30× larger.
|