star-staff commited on
Commit
01640a4
·
verified ·
1 Parent(s): 7dc1966

Updated README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -3
README.md CHANGED
@@ -1,3 +1,107 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ pipeline_tag: text-generation
7
+ base_model: Qwen/Qwen3-0.6B
8
+ tags:
9
+ - chat
10
+ - function-calling
11
+ - tool-use
12
+ - star-method
13
+ - sota
14
+ library_name: transformers
15
+ ---
16
+
17
+ # STAR-0b6
18
+
19
+ ## Introduction
20
+
21
+ **STAR-0b6** is a highly capable 0.6B parameter language model specialized in function calling, achieving **State-of-the-Art (SOTA)** performance on the [Berkeley Function Calling Leaderboard (BFCL)](https://huggingface.co/spaces/gorilla-llm/berkeley-function-calling-leaderboard) for models in its size class.
22
+
23
+ This model is the result of fine-tuning the `Qwen/Qwen3-0.6B` base model using the novel **STAR (Similarity-guided Teacher-Assisted Refinement)** framework. STAR is a holistic training curriculum designed to effectively transfer the advanced capabilities of large language models (LLMs) into "super-tiny" models, making them powerful, accessible, and efficient for real-world agentic applications.
24
+
25
+ The key innovations of the STAR framework include:
26
+ - **Similarity-guided RL (Sim-RL)**: A reinforcement learning mechanism that uses a fine-grained, similarity-based reward signal. This provides a more robust and continuous signal for policy optimization compared to simple binary rewards, which is crucial for complex, multi-solution tasks like function calling.
27
+ - **Constrained Knowledge Distillation (CKD)**: An advanced training objective that augments top-k forward KL divergence to suppress confidently incorrect predictions. This ensures training stability while preserving the model's exploration capacity, creating a strong foundation for the subsequent RL phase.
28
+
29
+ Our STAR-0b6 model significantly outperforms other open models under 1B parameters and even surpasses several larger models, demonstrating the effectiveness of the STAR methodology. For more details, please refer to our paper: [STAR: SIMILARITY-GUIDED TEACHER-ASSISTED REFINEMENT FOR SUPER-TINY FUNCTION CALLING MODELS](https://anonymous.4open.science/r/star-repo).
30
+
31
+ ## Model Details
32
+
33
+ - **Model Type**: Causal Language Model, fine-tuned for function calling.
34
+ - **Base Model**: `Qwen/Qwen3-0.6B`
35
+ - **Training Framework**: STAR (CKD + Sim-RL)
36
+ - **Architecture**: Transformer with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
37
+ - **Number of Parameters**: ~0.6B
38
+ - **Context Length**: Supports up to 32,768 tokens.
39
+
40
+ ## Requirements
41
+
42
+ The code of the Qwen3 family is supported in recent versions of `transformers`. We recommend using the latest version.
43
+
44
+ ```bash
45
+ pip install transformers torch accelerate
46
+ ```
47
+
48
+ ## Quickstart
49
+
50
+ Here is a code snippet showing how to load STAR-0b6 and use it for a chat-based task.
51
+
52
+ ```python
53
+ from transformers import AutoModelForCausalLM, AutoTokenizer
54
+
55
+ model_name = "star-lab/STAR-0b6"
56
+
57
+ model = AutoModelForCausalLM.from_pretrained(
58
+ model_name,
59
+ torch_dtype="auto",
60
+ device_map="auto"
61
+ )
62
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
63
+
64
+ # Example prompt that could trigger a function call
65
+ prompt = "What is the current weather in San Francisco?"
66
+ messages = [
67
+ {"role": "system", "content": "You are a helpful assistant with access to external tools."},
68
+ {"role": "user", "content": prompt}
69
+ ]
70
+ text = tokenizer.apply_chat_template(
71
+ messages,
72
+ tokenize=False,
73
+ add_generation_prompt=True
74
+ )
75
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
76
+
77
+ generated_ids = model.generate(
78
+ **model_inputs,
79
+ max_new_tokens=512
80
+ )
81
+ generated_ids = [
82
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
83
+ ]
84
+
85
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
86
+ print(response)
87
+ ```
88
+
89
+ ## Evaluation & Performance
90
+
91
+ STAR-0b6 has established a new state-of-the-art for models of its size on renowned function calling benchmarks.
92
+
93
+ - BFCLv3: Achieved 51.70% overall accuracy, outperforming all baseline and recent methods.
94
+ - ACEBench: Achieved 53.00% summary score, demonstrating superior generalization and robustness. This score is significantly higher than its base model (27.20%) and even surpasses much larger models like Llama3.1-8B (46.60%).
95
+
96
+ ## Citation
97
+
98
+ If you find our work helpful, please consider citing the STAR paper:
99
+
100
+ ```
101
+ @article{star2025,
102
+ title={STAR: SIMILARITY-GUIDED TEACHER-ASSISTED REFINEMENT FOR SUPER-TINY FUNCTION CALLING MODELS},
103
+ author={Ni, Jiliang and Pu, Jiachen and Yang, Zhongyi and Luo, Jingfeng and Hu, Conggang},
104
+ journal={arXiv preprint},
105
+ year={2025}
106
+ }
107
+ ```