star-lab
/

STAR-0b6

+---
+license: apache-2.0
+language:
+- en
+- zh
+pipeline_tag: text-generation
+base_model: Qwen/Qwen3-0.6B
+tags:
+- chat
+- function-calling
+- tool-use
+- star-method
+- sota
+library_name: transformers
+---
+# STAR-0b6
+## Introduction
+**STAR-0b6** is a highly capable 0.6B parameter language model specialized in function calling, achieving **State-of-the-Art (SOTA)** performance on the [Berkeley Function Calling Leaderboard (BFCL)](https://huggingface.co/spaces/gorilla-llm/berkeley-function-calling-leaderboard) for models in its size class.
+This model is the result of fine-tuning the `Qwen/Qwen3-0.6B` base model using the novel **STAR (Similarity-guided Teacher-Assisted Refinement)** framework. STAR is a holistic training curriculum designed to effectively transfer the advanced capabilities of large language models (LLMs) into "super-tiny" models, making them powerful, accessible, and efficient for real-world agentic applications.
+The key innovations of the STAR framework include:
+- **Similarity-guided RL (Sim-RL)**: A reinforcement learning mechanism that uses a fine-grained, similarity-based reward signal. This provides a more robust and continuous signal for policy optimization compared to simple binary rewards, which is crucial for complex, multi-solution tasks like function calling.
+- **Constrained Knowledge Distillation (CKD)**: An advanced training objective that augments top-k forward KL divergence to suppress confidently incorrect predictions. This ensures training stability while preserving the model's exploration capacity, creating a strong foundation for the subsequent RL phase.
+Our STAR-0b6 model significantly outperforms other open models under 1B parameters and even surpasses several larger models, demonstrating the effectiveness of the STAR methodology. For more details, please refer to our paper: [STAR: SIMILARITY-GUIDED TEACHER-ASSISTED REFINEMENT FOR SUPER-TINY FUNCTION CALLING MODELS](https://anonymous.4open.science/r/star-repo).
+## Model Details
+- **Model Type**: Causal Language Model, fine-tuned for function calling.
+- **Base Model**: `Qwen/Qwen3-0.6B`
+- **Training Framework**: STAR (CKD + Sim-RL)
+- **Architecture**: Transformer with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
+- **Number of Parameters**: ~0.6B
+- **Context Length**: Supports up to 32,768 tokens.
+## Requirements
+The code of the Qwen3 family is supported in recent versions of `transformers`. We recommend using the latest version.
+```bash
+pip install transformers torch accelerate
+```
+## Quickstart
+Here is a code snippet showing how to load STAR-0b6 and use it for a chat-based task.
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "star-lab/STAR-0b6"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+# Example prompt that could trigger a function call
+prompt = "What is the current weather in San Francisco?"
+messages = [
+    {"role": "system", "content": "You are a helpful assistant with access to external tools."},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=512
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(response)
+```
+## Evaluation & Performance
+STAR-0b6 has established a new state-of-the-art for models of its size on renowned function calling benchmarks.
+- BFCLv3: Achieved 51.70% overall accuracy, outperforming all baseline and recent methods.
+- ACEBench: Achieved 53.00% summary score, demonstrating superior generalization and robustness. This score is significantly higher than its base model (27.20%) and even surpasses much larger models like Llama3.1-8B (46.60%).
+## Citation
+If you find our work helpful, please consider citing the STAR paper:
+```
+@article{star2025,
+    title={STAR: SIMILARITY-GUIDED TEACHER-ASSISTED REFINEMENT FOR SUPER-TINY FUNCTION CALLING MODELS},
+    author={Ni, Jiliang and Pu, Jiachen and Yang, Zhongyi and Luo, Jingfeng and Hu, Conggang},
+    journal={arXiv preprint},
+    year={2025}
+}
+```