star-lab
/

STAR-0b6

@@ -26,7 +26,7 @@ The key innovations of the STAR framework include:
 - **Similarity-guided RL (Sim-RL)**: A reinforcement learning mechanism that uses a fine-grained, similarity-based reward signal. This provides a more robust and continuous signal for policy optimization compared to simple binary rewards, which is crucial for complex, multi-solution tasks like function calling.
 - **Constrained Knowledge Distillation (CKD)**: An advanced training objective that augments top-k forward KL divergence to suppress confidently incorrect predictions. This ensures training stability while preserving the model's exploration capacity, creating a strong foundation for the subsequent RL phase.
-Our STAR-0b6 model significantly outperforms other open models under 1B parameters and even surpasses several larger models, demonstrating the effectiveness of the STAR methodology. For more details, please refer to our paper: [STAR: SIMILARITY-GUIDED TEACHER-ASSISTED REFINEMENT FOR SUPER-TINY FUNCTION CALLING MODELS](https://anonymous.4open.science/r/star-repo).
 ## Model Details
@@ -103,17 +103,4 @@ For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTr
 STAR-0b6 has established a new state-of-the-art for models of its size on renowned function calling benchmarks.
 - BFCLv3: Achieved 51.70% overall accuracy, outperforming all baseline and recent methods.
-- ACEBench: Achieved 53.00% summary score, demonstrating superior generalization and robustness. This score is significantly higher than its base model (27.20%) and even surpasses much larger models like Llama3.1-8B (46.60%).
-## Citation
-If you find our work helpful, please consider citing the STAR paper:
-```
-@article{star2025,
-    title={STAR: SIMILARITY-GUIDED TEACHER-ASSISTED REFINEMENT FOR SUPER-TINY FUNCTION CALLING MODELS},
-    author={Ni, Jiliang and Pu, Jiachen and Yang, Zhongyi and Luo, Jingfeng and Hu, Conggang},
-    journal={arXiv preprint},
-    year={2025}
-}
-```

 - **Similarity-guided RL (Sim-RL)**: A reinforcement learning mechanism that uses a fine-grained, similarity-based reward signal. This provides a more robust and continuous signal for policy optimization compared to simple binary rewards, which is crucial for complex, multi-solution tasks like function calling.
 - **Constrained Knowledge Distillation (CKD)**: An advanced training objective that augments top-k forward KL divergence to suppress confidently incorrect predictions. This ensures training stability while preserving the model's exploration capacity, creating a strong foundation for the subsequent RL phase.
+Our STAR-0b6 model significantly outperforms other open models under 1B parameters and even surpasses several larger models, demonstrating the effectiveness of the STAR methodology.
 ## Model Details
 STAR-0b6 has established a new state-of-the-art for models of its size on renowned function calling benchmarks.
 - BFCLv3: Achieved 51.70% overall accuracy, outperforming all baseline and recent methods.
+- ACEBench: Achieved 53.00% summary score, demonstrating superior generalization and robustness. This score is significantly higher than its base model (27.20%) and even surpasses much larger models like Llama3.1-8B (46.60%).