Bingguang commited on
Commit
c1387ef
·
verified ·
1 Parent(s): 9efc216

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +243 -6
README.md CHANGED
@@ -4,12 +4,249 @@ base_model:
4
  - Qwen/Qwen3-4B-Instruct-2507
5
  ---
6
 
7
- # FunReason-MT
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
- **FunReason-MT** is a foundation model specialized in complex function calling. It is trained upon Qwen3-4B-Instruct-2507.
 
 
 
 
 
 
10
 
11
- ## Model Details
12
 
13
- * **Base Model:** `Qwen/Qwen3-4B-Instruct-2507`
14
- * **Finetuned for:** BFCL Benchmark
15
- * **Description:** An expert model for mastering intricate function calls, focusing on Berkeley Function-Calling Leaderboard (BFCL).
 
4
  - Qwen/Qwen3-4B-Instruct-2507
5
  ---
6
 
7
+ ---
8
+ license: apache-2.0
9
+ task_categories:
10
+ - question-answering
11
+ - text-generation
12
+ language:
13
+ - en
14
+ tags:
15
+ - agent
16
+ - Agentic Learning
17
+ - tool use
18
+ - BFCL
19
+ size_categories:
20
+ - 10K<n<100K
21
+ ---
22
+
23
+
24
+ # FunReason-MT-4B: Exceptional Multi-Turn Function Calling Model
25
+
26
+ <p align="center">
27
+ &nbsp&nbsp📊 <a href="https://huggingface.co/datasets/Bingguang/FunReason-MT">Dataset</a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/Bingguang/FunReason-MT">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/pdf/2505.20192">Paper</a>
28
+ </p>
29
+
30
+
31
+
32
+ ## Model Overview
33
+
34
+ The **FunReason-MT-4B** model is a high-performance **Large Language Model (LLM)** fine-tuned for complex, multi-turn **Function Calling (FC)** and agentic tool-use tasks. Built upon the **Qwen3-4B-Instruct-2507** base model , it has been trained using the novel **FunReason-MT data synthesis framework**.
35
+
36
+ FunReason-MT-4B achieves state-of-the-art results on the **Berkeley Function-Calling Leaderboard (BFCLv3)** Multi-Turn and Agentic Evaluation benchmarks. This performance demonstrates that high-quality, synthesized data can effectively overcome the complexity barrier in multi-turn FC data generation.
37
+
38
+ - **Base Model:** Qwen3-4B-Instruct-2507
39
+ - **Size:** 4 Billion parameters
40
+ - **Key Capability:** Advanced Multi-Turn Function Calling and Agentic Tool-Use
41
+
42
+ The full usage of the model is in this [pull request](https://github.com/ShishirPatil/gorilla/pull/1229)
43
+
44
+
45
+
46
+
47
+ ## 📊 Evaluation Results
48
+
49
+ The model was rigorously evaluated on the Berkeley Function-Calling Leaderboard (BFCL).
50
+
51
+ ### BFCLv3 Multi-Turn and Single-Turn Performance
52
+
53
+ | Model (4B - 235B) | Multi-Turn (Overall) | Single-Turn (Overall) |
54
+ | :------------------------------------- | :------------------------------------------: | :------------------------------------------: |
55
+ | Qwen3-4B-Instruct (Base) | 15.75 | 78.19 |
56
+ | **Qwen3-4B + FunReason-MT (RL)** | **56.50** | **85.02** |
57
+ | Claude-Sonnet-4-20250514 | 54.75 | 84.72 |
58
+ | DeepSeek-R1-0528 | 44.50 | 78.22 |
59
+ | GPT-4o-2024-11-20 | 42.50 | 77.21 |
60
+
61
+ ### BFCL Agentic Evaluation (BFCLv4 OOD)
62
+
63
+ The FunReason-MT trained model leads in out-of-distribution agentic tasks (Web Search and Memory).
64
+
65
+ | Model | BFCLv4 Overall Score |
66
+ | :----------------------------- | :------------------------------------------: |
67
+ | **FunReason-MT-4B (RL)** | **15.10** |
68
+ | ToolACE-2-8B | 14.83 |
69
+ | BitAgent-8B | 8.24 |
70
+ | XLAM-2-3b-fc-r | 7.42 |
71
+ | watt-tool-8B | 6.30 |
72
+
73
+
74
+ -----
75
+
76
+ ## 💻 Training Data and Framework
77
+
78
+ ### FunReason-MT Dataset
79
+
80
+ The training set comprises **10,000 high-quality multi-turn samples**. This dataset was generated using the three-phase FunReason-MT data synthesis framework, which focuses on generating complex trajectories that require:
81
+
82
+ 1. **Environment-API Graph Interactions** for collecting goal-directed, correct execution traces.
83
+ 2. **Advanced Tool-Query Synthesis** for creating logical-jump queries that abstract multi-step actions.
84
+ 3. **Guided Iterative Chain** for enforcing reliable, consistent Chain-of-Thought (CoT) generation using self-correction.
85
+
86
+ ### Training Details
87
+
88
+ The model was fine-tuned with function calling data from APIGen and the FunReason-MT dataset.
89
+
90
+ - **Training Libraries:** LLama-Factory and Verl.
91
+ - **Methodology:** Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL).
92
+ - **Hardware:** Conducted on 32 NVIDIA H20 GPUs.
93
+
94
+ ### Usage
95
+ Here we provide a code snippet of the handler of FunReason-MT.
96
+ ```python
97
+
98
+ class FunReasonMTHandler(OSSHandler):
99
+ def __init__(self, model_name, temperature) -> None:
100
+ super().__init__(model_name, temperature)
101
+ self.is_fc_model = False
102
+ self.top_p = 0.7
103
+ self.max_output_len = 20000
104
+ self.max_context_length = 247000
105
+
106
+ @override
107
+ def _query_prompting(self, inference_data: dict):
108
+ print("overide _query_prompting")
109
+ # We use the OpenAI Completions API
110
+ function: list[dict] = inference_data["function"]
111
+ message: list[dict] = inference_data["message"]
112
+
113
+ formatted_prompt: str = self._format_prompt(message, function)
114
+ inference_data["inference_input_log"] = {"formatted_prompt": formatted_prompt}
115
+
116
+ # Tokenize the formatted prompt to get token count
117
+ input_token_count = len(self.tokenizer.tokenize(formatted_prompt))
118
+
119
+ # Determine the number of tokens to request. Cap it at 4096 if the model has a larger limit.
120
+ if self.max_context_length < input_token_count + 2:
121
+ # If the prompt is already at the max length, just request 1000 token, we will get an error anyway
122
+ leftover_tokens_count = 1000
123
+ else:
124
+ leftover_tokens_count = min(
125
+ self.max_output_len,
126
+ self.max_context_length - input_token_count - 2,
127
+ )
128
+
129
+ extra_body = {}
130
+ if hasattr(self, "stop_token_ids"):
131
+ extra_body["stop_token_ids"] = self.stop_token_ids
132
+ if hasattr(self, "skip_special_tokens"):
133
+ extra_body["skip_special_tokens"] = self.skip_special_tokens
134
+
135
+ start_time = time.time()
136
+ if len(extra_body) > 0:
137
+ api_response = self.client.completions.create(
138
+ model=self.model_path_or_id,
139
+ temperature=self.temperature,
140
+ top_p=self.top_p,
141
+ prompt=formatted_prompt,
142
+ max_tokens=leftover_tokens_count,
143
+ extra_body=extra_body,
144
+ timeout=72000, # Avoid timeout errors
145
+ )
146
+ else:
147
+ api_response = self.client.completions.create(
148
+ model=self.model_path_or_id,
149
+ temperature=self.temperature,
150
+ top_p=self.top_p,
151
+ prompt=formatted_prompt,
152
+ max_tokens=leftover_tokens_count,
153
+ timeout=72000, # Avoid timeout errors
154
+ )
155
+ end_time = time.time()
156
+
157
+ return api_response, end_time - start_time
158
+
159
+ def _process_tool_response(self, tool_response_lst):
160
+ processed_tool_response = []
161
+ for tool_response in tool_response_lst:
162
+ processed_tool_response.append(tool_response)
163
+ return processed_tool_response
164
+
165
+ @override
166
+ def _format_prompt(self, messages, function):
167
+ new_messages = []
168
+ tool_content = []
169
+ for idx, message in enumerate(messages):
170
+ role = message["role"]
171
+ content = message["content"]
172
+ if role != "tool":
173
+ if len(tool_content) != 0:
174
+ tool_message = {
175
+ "role": "tool",
176
+ "content": str(tool_content),
177
+ }
178
+ new_messages.append(tool_message)
179
+ tool_content = []
180
+ new_messages.append(message)
181
+ else:
182
+ tool_content.append(content)
183
+ if len(tool_content) != 0:
184
+ tool_message = {
185
+ "role": "tool",
186
+ "content": str(tool_content),
187
+ }
188
+ new_messages.append(tool_message)
189
+ tool_content = []
190
+ print("new_messages", new_messages)
191
+ formatted_prompt = self.tokenizer.apply_chat_template(
192
+ new_messages, tokenize=False, add_generation_prompt=True
193
+ )
194
+ formatted_prompt += "<think>"
195
+ print("formated_prompt", formatted_prompt)
196
+ return formatted_prompt
197
+
198
+ @override
199
+ def _parse_query_response_prompting(self, api_response: Any) -> dict:
200
+ reasoning_content = ""
201
+ model_response = api_response.choices[0].text
202
+ cleaned_response = ""
203
+ reasoning_content = ""
204
+ cleaned_response = model_response
205
+ if "</think>" in model_response:
206
+ parts = model_response.split("</think>")
207
+ reasoning_content = parts[0].rstrip("\n").split("<think>")[-1].lstrip("\n")
208
+ cleaned_response = parts[-1].lstrip("\n")
209
+ else:
210
+ cleaned_response = "response outputs too long or no slash think in response."
211
+ print("cleaned_response: ", cleaned_response)
212
+ response_data = {
213
+ "model_responses": cleaned_response,
214
+ "model_responses_message_for_chat_history": {
215
+ "role": "assistant",
216
+ "content": cleaned_response,
217
+ },
218
+ "reasoning_content": reasoning_content,
219
+ "input_token": api_response.usage.prompt_tokens,
220
+ "output_token": api_response.usage.completion_tokens,
221
+ }
222
+
223
+ # Attach reasoning content to the assistant message for the next turn if present
224
+ if reasoning_content:
225
+ response_data["model_responses_message_for_chat_history"][
226
+ "reasoning_content"
227
+ ] = reasoning_content
228
+
229
+ if not reasoning_content:
230
+ del response_data["reasoning_content"]
231
+
232
+ return response_data
233
+ ```
234
+ -----
235
+
236
+ ## 🔗 Related Projects and Citation
237
+
238
+ This work is part of the open-source project **[AWorld, InclusionAI](https://github.com/inclusionAI/AWorld/)**.
239
+
240
+ If you use FunReason-MT in your research, please cite the technical report:
241
 
242
+ ```
243
+ @article{xu2025funreason,
244
+ title={FunReason-MT Technical Report: Advanced Data Synthesis Solution for Real-world Multi-turn Tool-use},
245
+ year={2025}
246
+ }
247
+ ```
248
+ ### Contact
249
 
250
+ For inquiries, please contact:
251
 
252
+ * `bingguanghao7@gmail.com`