yolay
/

SmartSnap-LLaMA3.1-8B

@@ -1,23 +1,22 @@
 ---
-license: apache-2.0
 datasets:
 - yolay/SmartSnap-FT
 - yolay/SmartSnap-RL
 language:
 - en
 metrics:
 - accuracy
-base_model:
-- meta-llama/Llama-3.1-8B-Instruct
 tags:
 - agent
 - mobile
 - gui
 ---
 <div align="center">
   <img src="https://raw.githubusercontent.com/yuleiqin/images/master/SmartSnap/mascot_smartsnap.png" width="400"/>
 </div>
@@ -28,7 +27,6 @@ tags:
   &nbsp;
 </p>
 We introduce **SmartSnap**, a paradigm shift that transforms GUI agents📱💻🤖 from passive task executors into proactive self-verifiers. By empowering agents to curate their own evidence of success through the **3C Principles** (Completeness, Conciseness, Creativity), we eliminate the bottleneck of expensive post-hoc verification while boosting reliability and performance on complex mobile tasks.
 # 📖 Overview
@@ -116,13 +114,10 @@ We release the following resources to accelerate research in self-verifying agen
 | **FT (ours)** | Qwen3-32B-Instruct | 28.98<sup>(+10.86%)</sup> | 35.92 | 97.79 | 97.33 |
 | **RL (ours)** | Qwen3-32B-Instruct | <u>34.78</u><sup>(+16.66%)</sup> | 40.26 | 89.47 | 93.67 |
 *<sup>*</sup> LLaMA3.1 models only natively support tool calling w/o reasoning.*
 *<sup>†</sup> The Android Instruct dataset is used for fine-tuning where self-verification is not performed.*
 *<sup>‡</sup> The official results are cited here for comparison.*
 ---
 - **Performance gains**: All model families achieve >16% improvement over prompting baselines, reaching competitive performance with models 10-30× larger.

 ---
+base_model:
+- meta-llama/Llama-3.1-8B-Instruct
 datasets:
 - yolay/SmartSnap-FT
 - yolay/SmartSnap-RL
 language:
 - en
+license: apache-2.0
 metrics:
 - accuracy
+pipeline_tag: image-text-to-text
+library_name: transformers
 tags:
 - agent
 - mobile
 - gui
 ---
 <div align="center">
   <img src="https://raw.githubusercontent.com/yuleiqin/images/master/SmartSnap/mascot_smartsnap.png" width="400"/>
 </div>
   &nbsp;
 </p>
 We introduce **SmartSnap**, a paradigm shift that transforms GUI agents📱💻🤖 from passive task executors into proactive self-verifiers. By empowering agents to curate their own evidence of success through the **3C Principles** (Completeness, Conciseness, Creativity), we eliminate the bottleneck of expensive post-hoc verification while boosting reliability and performance on complex mobile tasks.
 # 📖 Overview
 | **FT (ours)** | Qwen3-32B-Instruct | 28.98<sup>(+10.86%)</sup> | 35.92 | 97.79 | 97.33 |
 | **RL (ours)** | Qwen3-32B-Instruct | <u>34.78</u><sup>(+16.66%)</sup> | 40.26 | 89.47 | 93.67 |
 *<sup>*</sup> LLaMA3.1 models only natively support tool calling w/o reasoning.*
 *<sup>†</sup> The Android Instruct dataset is used for fine-tuning where self-verification is not performed.*
 *<sup>‡</sup> The official results are cited here for comparison.*
 ---
 - **Performance gains**: All model families achieve >16% improvement over prompting baselines, reaching competitive performance with models 10-30× larger.