Add pipeline tag and library name
Browse filesThis PR improves the model card metadata by adding the `image-text-to-text` pipeline tag and identifying `transformers` as the library name. These additions ensure the model is correctly categorized on the Hugging Face Hub and enable automated code snippets for users. It also ensures the model is properly linked to the relevant research paper and datasets.
README.md
CHANGED
|
@@ -1,23 +1,22 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
| 3 |
datasets:
|
| 4 |
- yolay/SmartSnap-FT
|
| 5 |
- yolay/SmartSnap-RL
|
| 6 |
language:
|
| 7 |
- en
|
|
|
|
| 8 |
metrics:
|
| 9 |
- accuracy
|
| 10 |
-
|
| 11 |
-
|
| 12 |
tags:
|
| 13 |
- agent
|
| 14 |
- mobile
|
| 15 |
- gui
|
| 16 |
---
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
<div align="center">
|
| 22 |
<img src="https://raw.githubusercontent.com/yuleiqin/images/master/SmartSnap/mascot_smartsnap.png" width="400"/>
|
| 23 |
</div>
|
|
@@ -28,7 +27,6 @@ tags:
|
|
| 28 |
|
| 29 |
</p>
|
| 30 |
|
| 31 |
-
|
| 32 |
We introduce **SmartSnap**, a paradigm shift that transforms GUI agents📱💻🤖 from passive task executors into proactive self-verifiers. By empowering agents to curate their own evidence of success through the **3C Principles** (Completeness, Conciseness, Creativity), we eliminate the bottleneck of expensive post-hoc verification while boosting reliability and performance on complex mobile tasks.
|
| 33 |
|
| 34 |
# 📖 Overview
|
|
@@ -116,13 +114,10 @@ We release the following resources to accelerate research in self-verifying agen
|
|
| 116 |
| **FT (ours)** | Qwen3-32B-Instruct | 28.98<sup>(+10.86%)</sup> | 35.92 | 97.79 | 97.33 |
|
| 117 |
| **RL (ours)** | Qwen3-32B-Instruct | <u>34.78</u><sup>(+16.66%)</sup> | 40.26 | 89.47 | 93.67 |
|
| 118 |
|
| 119 |
-
|
| 120 |
-
|
| 121 |
*<sup>*</sup> LLaMA3.1 models only natively support tool calling w/o reasoning.*
|
| 122 |
*<sup>†</sup> The Android Instruct dataset is used for fine-tuning where self-verification is not performed.*
|
| 123 |
*<sup>‡</sup> The official results are cited here for comparison.*
|
| 124 |
|
| 125 |
-
|
| 126 |
---
|
| 127 |
|
| 128 |
- **Performance gains**: All model families achieve >16% improvement over prompting baselines, reaching competitive performance with models 10-30× larger.
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- meta-llama/Llama-3.1-8B-Instruct
|
| 4 |
datasets:
|
| 5 |
- yolay/SmartSnap-FT
|
| 6 |
- yolay/SmartSnap-RL
|
| 7 |
language:
|
| 8 |
- en
|
| 9 |
+
license: apache-2.0
|
| 10 |
metrics:
|
| 11 |
- accuracy
|
| 12 |
+
pipeline_tag: image-text-to-text
|
| 13 |
+
library_name: transformers
|
| 14 |
tags:
|
| 15 |
- agent
|
| 16 |
- mobile
|
| 17 |
- gui
|
| 18 |
---
|
| 19 |
|
|
|
|
|
|
|
|
|
|
| 20 |
<div align="center">
|
| 21 |
<img src="https://raw.githubusercontent.com/yuleiqin/images/master/SmartSnap/mascot_smartsnap.png" width="400"/>
|
| 22 |
</div>
|
|
|
|
| 27 |
|
| 28 |
</p>
|
| 29 |
|
|
|
|
| 30 |
We introduce **SmartSnap**, a paradigm shift that transforms GUI agents📱💻🤖 from passive task executors into proactive self-verifiers. By empowering agents to curate their own evidence of success through the **3C Principles** (Completeness, Conciseness, Creativity), we eliminate the bottleneck of expensive post-hoc verification while boosting reliability and performance on complex mobile tasks.
|
| 31 |
|
| 32 |
# 📖 Overview
|
|
|
|
| 114 |
| **FT (ours)** | Qwen3-32B-Instruct | 28.98<sup>(+10.86%)</sup> | 35.92 | 97.79 | 97.33 |
|
| 115 |
| **RL (ours)** | Qwen3-32B-Instruct | <u>34.78</u><sup>(+16.66%)</sup> | 40.26 | 89.47 | 93.67 |
|
| 116 |
|
|
|
|
|
|
|
| 117 |
*<sup>*</sup> LLaMA3.1 models only natively support tool calling w/o reasoning.*
|
| 118 |
*<sup>†</sup> The Android Instruct dataset is used for fine-tuning where self-verification is not performed.*
|
| 119 |
*<sup>‡</sup> The official results are cited here for comparison.*
|
| 120 |
|
|
|
|
| 121 |
---
|
| 122 |
|
| 123 |
- **Performance gains**: All model families achieve >16% improvement over prompting baselines, reaching competitive performance with models 10-30× larger.
|