Update README.md
Browse files
README.md
CHANGED
|
@@ -26,7 +26,7 @@ The key innovations of the STAR framework include:
|
|
| 26 |
- **Similarity-guided RL (Sim-RL)**: A reinforcement learning mechanism that uses a fine-grained, similarity-based reward signal. This provides a more robust and continuous signal for policy optimization compared to simple binary rewards, which is crucial for complex, multi-solution tasks like function calling.
|
| 27 |
- **Constrained Knowledge Distillation (CKD)**: An advanced training objective that augments top-k forward KL divergence to suppress confidently incorrect predictions. This ensures training stability while preserving the model's exploration capacity, creating a strong foundation for the subsequent RL phase.
|
| 28 |
|
| 29 |
-
Our STAR-0b6 model significantly outperforms other open models under 1B parameters and even surpasses several larger models, demonstrating the effectiveness of the STAR methodology.
|
| 30 |
|
| 31 |
## Model Details
|
| 32 |
|
|
@@ -103,17 +103,4 @@ For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTr
|
|
| 103 |
STAR-0b6 has established a new state-of-the-art for models of its size on renowned function calling benchmarks.
|
| 104 |
|
| 105 |
- BFCLv3: Achieved 51.70% overall accuracy, outperforming all baseline and recent methods.
|
| 106 |
-
- ACEBench: Achieved 53.00% summary score, demonstrating superior generalization and robustness. This score is significantly higher than its base model (27.20%) and even surpasses much larger models like Llama3.1-8B (46.60%).
|
| 107 |
-
|
| 108 |
-
## Citation
|
| 109 |
-
|
| 110 |
-
If you find our work helpful, please consider citing the STAR paper:
|
| 111 |
-
|
| 112 |
-
```
|
| 113 |
-
@article{star2025,
|
| 114 |
-
title={STAR: SIMILARITY-GUIDED TEACHER-ASSISTED REFINEMENT FOR SUPER-TINY FUNCTION CALLING MODELS},
|
| 115 |
-
author={Ni, Jiliang and Pu, Jiachen and Yang, Zhongyi and Luo, Jingfeng and Hu, Conggang},
|
| 116 |
-
journal={arXiv preprint},
|
| 117 |
-
year={2025}
|
| 118 |
-
}
|
| 119 |
-
```
|
|
|
|
| 26 |
- **Similarity-guided RL (Sim-RL)**: A reinforcement learning mechanism that uses a fine-grained, similarity-based reward signal. This provides a more robust and continuous signal for policy optimization compared to simple binary rewards, which is crucial for complex, multi-solution tasks like function calling.
|
| 27 |
- **Constrained Knowledge Distillation (CKD)**: An advanced training objective that augments top-k forward KL divergence to suppress confidently incorrect predictions. This ensures training stability while preserving the model's exploration capacity, creating a strong foundation for the subsequent RL phase.
|
| 28 |
|
| 29 |
+
Our STAR-0b6 model significantly outperforms other open models under 1B parameters and even surpasses several larger models, demonstrating the effectiveness of the STAR methodology.
|
| 30 |
|
| 31 |
## Model Details
|
| 32 |
|
|
|
|
| 103 |
STAR-0b6 has established a new state-of-the-art for models of its size on renowned function calling benchmarks.
|
| 104 |
|
| 105 |
- BFCLv3: Achieved 51.70% overall accuracy, outperforming all baseline and recent methods.
|
| 106 |
+
- ACEBench: Achieved 53.00% summary score, demonstrating superior generalization and robustness. This score is significantly higher than its base model (27.20%) and even surpasses much larger models like Llama3.1-8B (46.60%).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|