| ## Tool Calling |
| To enable the tool calling feature, you may need to set certain tool calling parser options when starting the service. See [deploy_guidance](./deploy_guidance.md) for details. |
| In Kimi-K2, a tool calling process includes: |
| - Passing function descriptions to Kimi-K2 |
| - Kimi-K2 decides to make a function call and returns the necessary information for the function call to the user |
| - The user performs the function call, collects the call results, and passes the function call results to Kimi-K2 |
| - Kimi-K2 continues to generate content based on the function call results until the model believes it has obtained sufficient information to respond to the user |
|
|
| ### Preparing Tools |
| Suppose we have a function `get_weather` that can query the weather conditions in real-time. |
| This function accepts a city name as a parameter and returns the weather conditions. We need to prepare a structured description for it so that Kimi-K2 can understand its functionality. |
|
|
| ```python |
| def get_weather(city): |
| return {"weather": "Sunny"} |
| |
| # Collect the tool descriptions in tools |
| tools = [{ |
| "type": "function", |
| "function": { |
| "name": "get_weather", |
| "description": "Get weather information. Call this tool when the user needs to get weather information", |
| "parameters": { |
| "type": "object", |
| "required": ["city"], |
| "properties": { |
| "city": { |
| "type": "string", |
| "description": "City name", |
| } |
| } |
| } |
| } |
| }] |
| |
| # Tool name->object mapping for easy calling later |
| tool_map = { |
| "get_weather": get_weather |
| } |
| ``` |
| ### Chat with tools |
| We use `openai.OpenAI` to send messages to Kimi-K2 along with tool descriptions. Kimi-K2 will autonomously decide whether to use and how to use the provided tools. |
| If Kimi-K2 believes a tool call is needed, it will return a result with `finish_reason='tool_calls'`. At this point, the returned result includes the tool call information. |
| After calling tools with the provided information, we then need to append the tool call results to the chat history and continue calling Kimi-K2. |
| Kimi-K2 may need to call tools multiple times until the model believes the current results can answer the user's question. We should check `finish_reason` until it is not `tool_calls`. |
|
|
| The results obtained by the user after calling the tools should be added to `messages` with `role='tool'`. |
|
|
| ```python |
| import json |
| from openai import OpenAI |
| model_name='moonshotai/Kimi-K2-Instruct' |
| client = OpenAI(base_url=endpoint, |
| api_key='xxx') |
| |
| messages = [ |
| {"role": "user", "content": "What's the weather like in Beijing today? Let's check using the tool."} |
| ] |
| finish_reason = None |
| while finish_reason is None or finish_reason == "tool_calls": |
| completion = client.chat.completions.create( |
| model=model_name, |
| messages=messages, |
| temperature=0.3, |
| tools=tools, |
| tool_choice="auto", |
| ) |
| choice = completion.choices[0] |
| finish_reason = choice.finish_reason |
| # Note: The finish_reason when tool calls end may vary across different engines, so this condition check needs to be adjusted accordingly |
| if finish_reason == "tool_calls": |
| messages.append(choice.message) |
| for tool_call in choice.message.tool_calls: |
| tool_call_name = tool_call.function.name |
| tool_call_arguments = json.loads(tool_call.function.arguments) |
| tool_function = tool_map[tool_call_name] |
| tool_result = tool_function(tool_call_arguments) |
| print("tool_result", tool_result) |
| |
| messages.append({ |
| "role": "tool", |
| "tool_call_id": tool_call.id, |
| "name": tool_call_name, |
| "content": json.dumps(tool_result), |
| }) |
| print('-' * 100) |
| print(choice.message.content) |
| ``` |
| ### Tool Calling in Streaming Mode |
| Tool calling can also be used in streaming mode. In this case, we need to collect the tool call information returned in the stream until we have a complete tool call. Please refer to the code below: |
|
|
| ```python |
| messages = [ |
| {"role": "user", "content": "What's the weather like in Beijing today? Let's check using the tool."} |
| ] |
| finish_reason = None |
| msg = '' |
| while finish_reason is None or finish_reason == "tool_calls": |
| completion = client.chat.completions.create( |
| model=model_name, |
| messages=messages, |
| temperature=0.3, |
| tools=tools, |
| tool_choice="auto", |
| stream=True |
| ) |
| tool_calls = [] |
| for chunk in completion: |
| delta = chunk.choices[0].delta |
| if delta.content: |
| msg += delta.content |
| if delta.tool_calls: |
| for tool_call_chunk in delta.tool_calls: |
| if tool_call_chunk.index is not None: |
| # Extend the tool_calls list |
| while len(tool_calls) <= tool_call_chunk.index: |
| tool_calls.append({ |
| "id": "", |
| "type": "function", |
| "function": { |
| "name": "", |
| "arguments": "" |
| } |
| }) |
| |
| tc = tool_calls[tool_call_chunk.index] |
| |
| if tool_call_chunk.id: |
| tc["id"] += tool_call_chunk.id |
| if tool_call_chunk.function.name: |
| tc["function"]["name"] += tool_call_chunk.function.name |
| if tool_call_chunk.function.arguments: |
| tc["function"]["arguments"] += tool_call_chunk.function.arguments |
| |
| finish_reason = chunk.choices[0].finish_reason |
| # Note: The finish_reason when tool calls end may vary across different engines, so this condition check needs to be adjusted accordingly |
| if finish_reason == "tool_calls": |
| for tool_call in tool_calls: |
| tool_call_name = tool_call['function']['name'] |
| tool_call_arguments = json.loads(tool_call['function']['arguments']) |
| tool_function = tool_map[tool_call_name] |
| tool_result = tool_function(tool_call_arguments) |
| messages.append({ |
| "role": "tool", |
| "tool_call_id": tool_call['id'], |
| "name": tool_call_name, |
| "content": json.dumps(tool_result), |
| }) |
| # The text generated by the tool call is not the final version, reset msg |
| msg = '' |
| |
| print(msg) |
| ``` |
| ### Manually Parsing Tool Calls |
| The tool call requests generated by Kimi-K2 can also be parsed manually, which is especially useful when the service you are using does not provide a tool-call parser. |
| The tool call requests generated by Kimi-K2 are wrapped by `<|tool_calls_section_begin|>` and `<|tool_calls_section_end|>`, |
| with each tool call wrapped by `<|tool_call_begin|>` and `<|tool_call_end|>`. The tool ID and arguments are separated by `<|tool_call_argument_begin|>`. |
| The format of the tool ID is `functions.{func_name}:{idx}`, from which we can parse the function name. |
|
|
| Based on the above rules, we can directly post request to the completions interface and manually parse tool calls. |
|
|
| ```python |
| import requests |
| from transformers import AutoTokenizer |
| messages = [ |
| {"role": "user", "content": "What's the weather like in Beijing today? Let's check using the tool."} |
| ] |
| msg = '' |
| tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) |
| while True: |
| text = tokenizer.apply_chat_template( |
| messages, |
| tokenize=False, |
| tools=tools, |
| add_generation_prompt=True, |
| ) |
| payload = { |
| "model": model_name, |
| "prompt": text, |
| "max_tokens": 512 |
| } |
| response = requests.post( |
| f"{endpoint}/completions", |
| headers={"Content-Type": "application/json"}, |
| json=payload, |
| stream=False, |
| ) |
| raw_out = response.json() |
| |
| raw_output = raw_out["choices"][0]["text"] |
| tool_calls = extract_tool_call_info(raw_output) |
| if len(tool_calls) == 0: |
| # No tool calls |
| msg = raw_output |
| break |
| else: |
| for tool_call in tool_calls: |
| tool_call_name = tool_call['function']['name'] |
| tool_call_arguments = json.loads(tool_call['function']['arguments']) |
| tool_function = tool_map[tool_call_name] |
| tool_result = tool_function(tool_call_arguments) |
| |
| messages.append({ |
| "role": "tool", |
| "tool_call_id": tool_call['id'], |
| "name": tool_call_name, |
| "content": json.dumps(tool_result), |
| }) |
| print('-' * 100) |
| print(msg) |
| ``` |
| Here, `extract_tool_call_info` parses the model output and returns the model call information. A simple implementation would be: |
| ```python |
| def extract_tool_call_info(tool_call_rsp: str): |
| if '<|tool_calls_section_begin|>' not in tool_call_rsp: |
| # No tool calls |
| return [] |
| import re |
| pattern = r"<\|tool_calls_section_begin\|>(.*?)<\|tool_calls_section_end\|>" |
| |
| tool_calls_sections = re.findall(pattern, tool_call_rsp, re.DOTALL) |
| |
| # Extract multiple tool calls |
| func_call_pattern = r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[\w\.]+:\d+)\s*<\|tool_call_argument_begin\|>\s*(?P<function_arguments>.*?)\s*<\|tool_call_end\|>" |
| tool_calls = [] |
| for match in re.findall(func_call_pattern, tool_calls_sections[0], re.DOTALL): |
| function_id, function_args = match |
| # function_id: functions.get_weather:0 |
| function_name = function_id.split('.')[1].split(':')[0] |
| tool_calls.append( |
| { |
| "id": function_id, |
| "type": "function", |
| "function": { |
| "name": function_name, |
| "arguments": function_args |
| } |
| } |
| ) |
| return tool_calls |
| ``` |
|
|
| ## FAQ |
|
|
| #### Q1: I received special tokens like '<|tool_call_begin|>' in the 'content' field instead of a normal tool_call. |
| |
| This indicates a tool-call crash, which most often occurs in multi-turn tool-calling scenarios due to incorrect tool-call ID. K2 expects the ID to follow the format `functions.func_name:idx`, where `functions` is a fixed string; `func_name` is the actual function name, like `get_weather`, and `idx` is a global counter that starts at 0 and increments with each function invocation. |
| Please check all tool-call IDs in the message list. |
|
|
|
|
| #### Q2: My tool-call ID is incorrect—how can I fix it? |
|
|
| First, make sure your code and chat template are up to date with the latest version from the Hugging Face repo. |
| If you're using vLLM or SGLang and they are generating random tool-call IDs, upgrade them to the latest release. For other frameworks, you must either parse the tool-call ID from the model output and set it correctly in the server-side response, or rewrite every tool-call ID according to the rules above on the client side before sending the messages to Kimi K2. |
|
|
| #### Q3: My tool call id is correct, but I still get crashed in multiturn tool call. |
|
|
| Please describe your situation in the [discussion](https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905/discussions) |