nielsr HF Staff commited on
Commit
014cf91
·
verified ·
1 Parent(s): d5c820e

Add pipeline tag and library name

Browse files

This PR improves the model card metadata by adding the `image-text-to-text` pipeline tag and identifying `transformers` as the library name. These additions ensure the model is correctly categorized on the Hugging Face Hub and enable automated code snippets for users. It also ensures the model is properly linked to the relevant research paper and datasets.

Files changed (1) hide show
  1. README.md +5 -10
README.md CHANGED
@@ -1,23 +1,22 @@
1
  ---
2
- license: apache-2.0
 
3
  datasets:
4
  - yolay/SmartSnap-FT
5
  - yolay/SmartSnap-RL
6
  language:
7
  - en
 
8
  metrics:
9
  - accuracy
10
- base_model:
11
- - meta-llama/Llama-3.1-8B-Instruct
12
  tags:
13
  - agent
14
  - mobile
15
  - gui
16
  ---
17
 
18
-
19
-
20
-
21
  <div align="center">
22
  <img src="https://raw.githubusercontent.com/yuleiqin/images/master/SmartSnap/mascot_smartsnap.png" width="400"/>
23
  </div>
@@ -28,7 +27,6 @@ tags:
28
  &nbsp;
29
  </p>
30
 
31
-
32
  We introduce **SmartSnap**, a paradigm shift that transforms GUI agents📱💻🤖 from passive task executors into proactive self-verifiers. By empowering agents to curate their own evidence of success through the **3C Principles** (Completeness, Conciseness, Creativity), we eliminate the bottleneck of expensive post-hoc verification while boosting reliability and performance on complex mobile tasks.
33
 
34
  # 📖 Overview
@@ -116,13 +114,10 @@ We release the following resources to accelerate research in self-verifying agen
116
  | **FT (ours)** | Qwen3-32B-Instruct | 28.98<sup>(+10.86%)</sup> | 35.92 | 97.79 | 97.33 |
117
  | **RL (ours)** | Qwen3-32B-Instruct | <u>34.78</u><sup>(+16.66%)</sup> | 40.26 | 89.47 | 93.67 |
118
 
119
-
120
-
121
  *<sup>*</sup> LLaMA3.1 models only natively support tool calling w/o reasoning.*
122
  *<sup>†</sup> The Android Instruct dataset is used for fine-tuning where self-verification is not performed.*
123
  *<sup>‡</sup> The official results are cited here for comparison.*
124
 
125
-
126
  ---
127
 
128
  - **Performance gains**: All model families achieve >16% improvement over prompting baselines, reaching competitive performance with models 10-30× larger.
 
1
  ---
2
+ base_model:
3
+ - meta-llama/Llama-3.1-8B-Instruct
4
  datasets:
5
  - yolay/SmartSnap-FT
6
  - yolay/SmartSnap-RL
7
  language:
8
  - en
9
+ license: apache-2.0
10
  metrics:
11
  - accuracy
12
+ pipeline_tag: image-text-to-text
13
+ library_name: transformers
14
  tags:
15
  - agent
16
  - mobile
17
  - gui
18
  ---
19
 
 
 
 
20
  <div align="center">
21
  <img src="https://raw.githubusercontent.com/yuleiqin/images/master/SmartSnap/mascot_smartsnap.png" width="400"/>
22
  </div>
 
27
  &nbsp;
28
  </p>
29
 
 
30
  We introduce **SmartSnap**, a paradigm shift that transforms GUI agents📱💻🤖 from passive task executors into proactive self-verifiers. By empowering agents to curate their own evidence of success through the **3C Principles** (Completeness, Conciseness, Creativity), we eliminate the bottleneck of expensive post-hoc verification while boosting reliability and performance on complex mobile tasks.
31
 
32
  # 📖 Overview
 
114
  | **FT (ours)** | Qwen3-32B-Instruct | 28.98<sup>(+10.86%)</sup> | 35.92 | 97.79 | 97.33 |
115
  | **RL (ours)** | Qwen3-32B-Instruct | <u>34.78</u><sup>(+16.66%)</sup> | 40.26 | 89.47 | 93.67 |
116
 
 
 
117
  *<sup>*</sup> LLaMA3.1 models only natively support tool calling w/o reasoning.*
118
  *<sup>†</sup> The Android Instruct dataset is used for fine-tuning where self-verification is not performed.*
119
  *<sup>‡</sup> The official results are cited here for comparison.*
120
 
 
121
  ---
122
 
123
  - **Performance gains**: All model families achieve >16% improvement over prompting baselines, reaching competitive performance with models 10-30× larger.