| ================================================================================ |
| HERMES 4.5 CHAT TEMPLATE TEST RESULTS |
| Generated: 2026-01-25T00:00:11.835699 |
| Template: /home/teknium/datasets/hermes-4.5-chat_template.jinja |
| ================================================================================ |
|
|
| -------------------------------------------------------------------------------- |
| TEST: 1. Simple user message only |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Hello, how are you?" |
| } |
| ] |
| add_generation_prompt = True |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| Hello, how are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 2. Simple user-assistant exchange |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "What is 2 + 2?" |
| }, |
| { |
| "role": "assistant", |
| "content": "The answer is 4." |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| What is 2 + 2?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| The answer is 4.<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 3. Multi-turn conversation |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Hi there!" |
| }, |
| { |
| "role": "assistant", |
| "content": "Hello! How can I help you today?" |
| }, |
| { |
| "role": "user", |
| "content": "Tell me a joke." |
| }, |
| { |
| "role": "assistant", |
| "content": "Why did the scarecrow win an award? Because he was outstanding in his field!" |
| }, |
| { |
| "role": "user", |
| "content": "That was funny!" |
| } |
| ] |
| add_generation_prompt = True |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| Hi there!<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| Hello! How can I help you today?<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Tell me a joke.<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| Why did the scarecrow win an award? Because he was outstanding in his field!<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| That was funny!<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 4. With system message |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "system", |
| "content": "You are a helpful coding assistant." |
| }, |
| { |
| "role": "user", |
| "content": "Write a Python hello world." |
| } |
| ] |
| add_generation_prompt = True |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
|
|
| You are a helpful coding assistant.<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Write a Python hello world.<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 5. No system message (should just output bos_token) |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "What's the weather like?" |
| } |
| ] |
| add_generation_prompt = True |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| What's the weather like?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 6. System message mid-conversation (edge case) |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "system", |
| "content": "You are a pirate." |
| }, |
| { |
| "role": "user", |
| "content": "Hello!" |
| }, |
| { |
| "role": "assistant", |
| "content": "Ahoy there, matey!" |
| }, |
| { |
| "role": "system", |
| "content": "Now you are a robot." |
| }, |
| { |
| "role": "user", |
| "content": "Hello again!" |
| } |
| ] |
| add_generation_prompt = True |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
|
|
| You are a pirate.<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Hello!<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| Ahoy there, matey!<|eot_id|><|start_header_id|>system<|end_header_id|> |
|
|
| Now you are a robot.<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Hello again!<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 7. Assistant with reasoning_content field (keep_thinking=True) |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "What is the square root of 144?" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "The user is asking for the square root of 144. I know that 12 * 12 = 144, so the square root of 144 is 12.", |
| "content": "The square root of 144 is 12." |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| What is the square root of 144?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>The user is asking for the square root of 144. I know that 12 * 12 = 144, so the square root of 144 is 12.</think>The square root of 144 is 12.<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 8. Assistant with reasoning_content field (keep_thinking=False) |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "What is the square root of 144?" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "The user is asking for the square root of 144. I know that 12 * 12 = 144, so the square root of 144 is 12.", |
| "content": "The square root of 144 is 12." |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = False |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| What is the square root of 144?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| The square root of 144 is 12.<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 9. Multi-turn with reasoning_content in history |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "What is 5 factorial?" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "5! = 5 * 4 * 3 * 2 * 1 = 120", |
| "content": "5 factorial (5!) equals 120." |
| }, |
| { |
| "role": "user", |
| "content": "And 6 factorial?" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "6! = 6 * 5! = 6 * 120 = 720", |
| "content": "6 factorial (6!) equals 720." |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| What is 5 factorial?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>5! = 5 * 4 * 3 * 2 * 1 = 120</think>5 factorial (5!) equals 120.<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| And 6 factorial?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>6! = 6 * 5! = 6 * 120 = 720</think>6 factorial (6!) equals 720.<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 10. Assistant with inline <think> tags (keep_thinking=True) |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Solve: 15 * 7" |
| }, |
| { |
| "role": "assistant", |
| "content": "<think>Let me calculate 15 * 7. I can break this down: 15 * 7 = (10 * 7) + (5 * 7) = 70 + 35 = 105.</think>The answer is 105." |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| Solve: 15 * 7<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Let me calculate 15 * 7. I can break this down: 15 * 7 = (10 * 7) + (5 * 7) = 70 + 35 = 105.</think>The answer is 105.<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 11. Assistant with inline <think> tags (keep_thinking=False) |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Solve: 15 * 7" |
| }, |
| { |
| "role": "assistant", |
| "content": "<think>Let me calculate 15 * 7. I can break this down: 15 * 7 = (10 * 7) + (5 * 7) = 70 + 35 = 105.</think>The answer is 105." |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = False |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| Solve: 15 * 7<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| The answer is 105.<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 12. Inline think tags with newlines |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "What's the capital of France?" |
| }, |
| { |
| "role": "assistant", |
| "content": "<think>\nThe user is asking about France's capital.\nParis is the capital of France.\nThis is a straightforward geography question.\n</think>\nThe capital of France is Paris." |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| What's the capital of France?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>The user is asking about France's capital. |
| Paris is the capital of France. |
| This is a straightforward geography question.</think>The capital of France is Paris.<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 13. Content without opening <think> but with </think> (edge case) |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Hello" |
| }, |
| { |
| "role": "assistant", |
| "content": "Some reasoning here</think>And here is the actual response." |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| Hello<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Some reasoning here</think>And here is the actual response.<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 14. Tools provided with system message |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "system", |
| "content": "You are a helpful assistant." |
| }, |
| { |
| "role": "user", |
| "content": "What's the weather in Tokyo?" |
| } |
| ] |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "get_weather", |
| "description": "Get the current weather in a location", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "location": { |
| "type": "string", |
| "description": "City name" |
| }, |
| "unit": { |
| "type": "string", |
| "enum": [ |
| "celsius", |
| "fahrenheit" |
| ] |
| } |
| }, |
| "required": [ |
| "location" |
| ] |
| } |
| } |
| } |
| ] |
| add_generation_prompt = True |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| You are a helpful assistant. |
|
|
| You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. |
|
|
| <tools> |
| {"function": {"description": "Get the current weather in a location", "name": "get_weather", "parameters": {"properties": {"location": {"description": "City name", "type": "string"}, "unit": {"enum": ["celsius", "fahrenheit"], "type": "string"}}, "required": ["location"], "type": "object"}}, "type": "function"} |
| </tools> |
|
|
| For each function call return a JSON object, with the following pydantic model json schema: |
|
|
| {"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]} |
|
|
| Each function call should be enclosed within <tool_call> </tool_call> XML tags. |
|
|
| Example: |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "/path/to/file.txt"}} |
| </tool_call> |
| <|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| What's the weather in Tokyo?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 15. Tools provided without system message |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Read the file /tmp/test.txt" |
| } |
| ] |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "read_file", |
| "description": "Read contents of a file", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "path": { |
| "type": "string", |
| "description": "File path" |
| } |
| }, |
| "required": [ |
| "path" |
| ] |
| } |
| } |
| } |
| ] |
| add_generation_prompt = True |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. |
|
|
| <tools> |
| {"function": {"description": "Read contents of a file", "name": "read_file", "parameters": {"properties": {"path": {"description": "File path", "type": "string"}}, "required": ["path"], "type": "object"}}, "type": "function"} |
| </tools> |
|
|
| For each function call return a JSON object, with the following pydantic model json schema: |
|
|
| {"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]} |
|
|
| Each function call should be enclosed within <tool_call> </tool_call> XML tags. |
|
|
| Example: |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "/path/to/file.txt"}} |
| </tool_call> |
| <|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Read the file /tmp/test.txt<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 16. Assistant making a tool call |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "What's the weather in London?" |
| }, |
| { |
| "role": "assistant", |
| "content": "I'll check the weather for you.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "London", |
| "unit": "celsius" |
| } |
| } |
| } |
| ] |
| } |
| ] |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "get_weather", |
| "description": "Get weather", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "location": { |
| "type": "string" |
| } |
| } |
| } |
| } |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. |
|
|
| <tools> |
| {"function": {"description": "Get weather", "name": "get_weather", "parameters": {"properties": {"location": {"type": "string"}}, "type": "object"}}, "type": "function"} |
| </tools> |
|
|
| For each function call return a JSON object, with the following pydantic model json schema: |
|
|
| {"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]} |
|
|
| Each function call should be enclosed within <tool_call> </tool_call> XML tags. |
|
|
| Example: |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "/path/to/file.txt"}} |
| </tool_call> |
| <|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| What's the weather in London?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| I'll check the weather for you. |
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "London", "unit": "celsius"}} |
| </tool_call><|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 17. Assistant tool call with reasoning_content |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Check the weather in Paris" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "The user wants weather information for Paris. I should use the get_weather function.", |
| "content": "Let me check that for you.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "Paris" |
| } |
| } |
| } |
| ] |
| } |
| ] |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "get_weather", |
| "description": "Get weather", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "location": { |
| "type": "string" |
| } |
| } |
| } |
| } |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. |
|
|
| <tools> |
| {"function": {"description": "Get weather", "name": "get_weather", "parameters": {"properties": {"location": {"type": "string"}}, "type": "object"}}, "type": "function"} |
| </tools> |
|
|
| For each function call return a JSON object, with the following pydantic model json schema: |
|
|
| {"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]} |
|
|
| Each function call should be enclosed within <tool_call> </tool_call> XML tags. |
|
|
| Example: |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "/path/to/file.txt"}} |
| </tool_call> |
| <|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Check the weather in Paris<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>The user wants weather information for Paris. I should use the get_weather function.</think>Let me check that for you. |
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "Paris"}} |
| </tool_call><|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 18. Multiple tool calls in one message |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Get weather for Tokyo and New York" |
| }, |
| { |
| "role": "assistant", |
| "content": "I'll check both locations.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "Tokyo" |
| } |
| } |
| }, |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "New York" |
| } |
| } |
| } |
| ] |
| } |
| ] |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "get_weather", |
| "description": "Get weather", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "location": { |
| "type": "string" |
| } |
| } |
| } |
| } |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. |
|
|
| <tools> |
| {"function": {"description": "Get weather", "name": "get_weather", "parameters": {"properties": {"location": {"type": "string"}}, "type": "object"}}, "type": "function"} |
| </tools> |
|
|
| For each function call return a JSON object, with the following pydantic model json schema: |
|
|
| {"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]} |
|
|
| Each function call should be enclosed within <tool_call> </tool_call> XML tags. |
|
|
| Example: |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "/path/to/file.txt"}} |
| </tool_call> |
| <|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Get weather for Tokyo and New York<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| I'll check both locations. |
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "Tokyo"}} |
| </tool_call> |
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "New York"}} |
| </tool_call><|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 19. Single tool response |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "What's the weather?" |
| }, |
| { |
| "role": "assistant", |
| "content": "", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "NYC" |
| } |
| } |
| } |
| ] |
| }, |
| { |
| "role": "tool", |
| "content": "{\"temperature\": 72, \"condition\": \"sunny\"}" |
| } |
| ] |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "get_weather", |
| "description": "Get weather", |
| "parameters": {} |
| } |
| } |
| ] |
| add_generation_prompt = True |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. |
|
|
| <tools> |
| {"function": {"description": "Get weather", "name": "get_weather", "parameters": {}}, "type": "function"} |
| </tools> |
|
|
| For each function call return a JSON object, with the following pydantic model json schema: |
|
|
| {"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]} |
|
|
| Each function call should be enclosed within <tool_call> </tool_call> XML tags. |
|
|
| Example: |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "/path/to/file.txt"}} |
| </tool_call> |
| <|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| What's the weather?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "NYC"}} |
| </tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|> |
|
|
| <tool_response> |
| {"temperature": 72, "condition": "sunny"} |
| </tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 20. Multiple consecutive tool responses (should batch under one user header) |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Compare weather in Tokyo and London" |
| }, |
| { |
| "role": "assistant", |
| "content": "", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "Tokyo" |
| } |
| } |
| }, |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "London" |
| } |
| } |
| } |
| ] |
| }, |
| { |
| "role": "tool", |
| "content": "{\"temperature\": 25, \"condition\": \"cloudy\"}" |
| }, |
| { |
| "role": "tool", |
| "content": "{\"temperature\": 18, \"condition\": \"rainy\"}" |
| } |
| ] |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "get_weather", |
| "description": "Get weather", |
| "parameters": {} |
| } |
| } |
| ] |
| add_generation_prompt = True |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. |
|
|
| <tools> |
| {"function": {"description": "Get weather", "name": "get_weather", "parameters": {}}, "type": "function"} |
| </tools> |
|
|
| For each function call return a JSON object, with the following pydantic model json schema: |
|
|
| {"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]} |
|
|
| Each function call should be enclosed within <tool_call> </tool_call> XML tags. |
|
|
| Example: |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "/path/to/file.txt"}} |
| </tool_call> |
| <|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Compare weather in Tokyo and London<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "Tokyo"}} |
| </tool_call> |
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "London"}} |
| </tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|> |
|
|
| <tool_response> |
| {"temperature": 25, "condition": "cloudy"} |
| </tool_response> |
| <tool_response> |
| {"temperature": 18, "condition": "rainy"} |
| </tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 21. Full tool calling flow (call -> response -> final answer) |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "system", |
| "content": "You are a helpful assistant with tool access." |
| }, |
| { |
| "role": "user", |
| "content": "What's in /tmp/data.txt?" |
| }, |
| { |
| "role": "assistant", |
| "content": "I'll read that file for you.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "read_file", |
| "arguments": { |
| "path": "/tmp/data.txt" |
| } |
| } |
| } |
| ] |
| }, |
| { |
| "role": "tool", |
| "content": "Hello, World!" |
| }, |
| { |
| "role": "assistant", |
| "content": "The file contains: Hello, World!" |
| } |
| ] |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "read_file", |
| "description": "Read file", |
| "parameters": {} |
| } |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| You are a helpful assistant with tool access. |
|
|
| You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. |
|
|
| <tools> |
| {"function": {"description": "Read file", "name": "read_file", "parameters": {}}, "type": "function"} |
| </tools> |
|
|
| For each function call return a JSON object, with the following pydantic model json schema: |
|
|
| {"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]} |
|
|
| Each function call should be enclosed within <tool_call> </tool_call> XML tags. |
|
|
| Example: |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "/path/to/file.txt"}} |
| </tool_call> |
| <|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| What's in /tmp/data.txt?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| I'll read that file for you. |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "/tmp/data.txt"}} |
| </tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|> |
|
|
| <tool_response> |
| Hello, World! |
| </tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| The file contains: Hello, World!<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 22. Assistant with empty content (only tool_calls) |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Get the time" |
| }, |
| { |
| "role": "assistant", |
| "content": "", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "get_time", |
| "arguments": {} |
| } |
| } |
| ] |
| } |
| ] |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "get_time", |
| "description": "Get time", |
| "parameters": {} |
| } |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. |
|
|
| <tools> |
| {"function": {"description": "Get time", "name": "get_time", "parameters": {}}, "type": "function"} |
| </tools> |
|
|
| For each function call return a JSON object, with the following pydantic model json schema: |
|
|
| {"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]} |
|
|
| Each function call should be enclosed within <tool_call> </tool_call> XML tags. |
|
|
| Example: |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "/path/to/file.txt"}} |
| </tool_call> |
| <|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Get the time<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <tool_call> |
| {"name": "get_time", "arguments": {}} |
| </tool_call><|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 23. Assistant with None content |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Hello" |
| }, |
| { |
| "role": "assistant", |
| "content": null |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| Hello<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
| <|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 24. Assistant with only reasoning_content (no visible content) |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Think about life" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Deep thoughts about existence and meaning...", |
| "content": "" |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| Think about life<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Deep thoughts about existence and meaning...</think><|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 25. Very long reasoning content |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Solve a complex problem" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Step 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nStep 1: Analyze the problem\nFinal conclusion reached.", |
| "content": "Here's the solution." |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| Solve a complex problem<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Step 1: Analyze the problem |
| Final conclusion reached.</think>Here's the solution.<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 26. Content with special characters |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Show me code with <tags> and \"quotes\"" |
| }, |
| { |
| "role": "assistant", |
| "content": "Here's code:\n```python\nprint(\"<hello>\")\n```" |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| Show me code with <tags> and "quotes"<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| Here's code: |
| ```python |
| print("<hello>") |
| ```<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 27. Unicode content |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Say hello in Japanese" |
| }, |
| { |
| "role": "assistant", |
| "content": "こんにちは (Konnichiwa) means hello in Japanese! 🇯🇵" |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| Say hello in Japanese<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| こんにちは (Konnichiwa) means hello in Japanese! 🇯🇵<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 28. Both reasoning_content AND inline think tags (reasoning_content should take precedence) |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "user", |
| "content": "Test priority" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "This is from reasoning_content field", |
| "content": "<think>This is inline and should be ignored</think>The actual response." |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>user<|end_header_id|> |
|
|
| Test priority<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>This is from reasoning_content field</think>The actual response.<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 29. Full conversation: Weather assistant with multiple tool calls |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "system", |
| "content": "You are a helpful weather assistant. Use tools to get accurate weather data." |
| }, |
| { |
| "role": "user", |
| "content": "What's the weather like in Tokyo and London right now?" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "The user wants weather for two cities. I need to call get_weather twice.", |
| "content": "I'll check the weather in both cities for you.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "Tokyo", |
| "unit": "celsius" |
| } |
| } |
| }, |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "London", |
| "unit": "celsius" |
| } |
| } |
| } |
| ] |
| }, |
| { |
| "role": "tool", |
| "content": "{\"location\": \"Tokyo\", \"temperature\": 22, \"condition\": \"Partly cloudy\", \"humidity\": 65}" |
| }, |
| { |
| "role": "tool", |
| "content": "{\"location\": \"London\", \"temperature\": 15, \"condition\": \"Rainy\", \"humidity\": 80}" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Got both results. Tokyo is warmer and partly cloudy, London is cooler and rainy.", |
| "content": "Here's the current weather:\n\n**Tokyo**: 22°C, Partly cloudy, 65% humidity\n**London**: 15°C, Rainy, 80% humidity\n\nTokyo is noticeably warmer today!" |
| }, |
| { |
| "role": "user", |
| "content": "Which city should I visit if I want good weather?" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Based on the weather data, Tokyo has better conditions - warmer and not raining.", |
| "content": "Based on the current conditions, I'd recommend **Tokyo**! It's warmer at 22°C with partly cloudy skies, while London is experiencing rain. Tokyo would be much more pleasant for outdoor activities." |
| } |
| ] |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "get_weather", |
| "description": "Get current weather for a location", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "location": { |
| "type": "string", |
| "description": "City name" |
| }, |
| "unit": { |
| "type": "string", |
| "enum": [ |
| "celsius", |
| "fahrenheit" |
| ] |
| } |
| }, |
| "required": [ |
| "location" |
| ] |
| } |
| } |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| You are a helpful weather assistant. Use tools to get accurate weather data. |
|
|
| You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. |
|
|
| <tools> |
| {"function": {"description": "Get current weather for a location", "name": "get_weather", "parameters": {"properties": {"location": {"description": "City name", "type": "string"}, "unit": {"enum": ["celsius", "fahrenheit"], "type": "string"}}, "required": ["location"], "type": "object"}}, "type": "function"} |
| </tools> |
|
|
| For each function call return a JSON object, with the following pydantic model json schema: |
|
|
| {"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]} |
|
|
| Each function call should be enclosed within <tool_call> </tool_call> XML tags. |
|
|
| Example: |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "/path/to/file.txt"}} |
| </tool_call> |
| <|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| What's the weather like in Tokyo and London right now?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>The user wants weather for two cities. I need to call get_weather twice.</think>I'll check the weather in both cities for you. |
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "Tokyo", "unit": "celsius"}} |
| </tool_call> |
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "London", "unit": "celsius"}} |
| </tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|> |
|
|
| <tool_response> |
| {"location": "Tokyo", "temperature": 22, "condition": "Partly cloudy", "humidity": 65} |
| </tool_response> |
| <tool_response> |
| {"location": "London", "temperature": 15, "condition": "Rainy", "humidity": 80} |
| </tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Got both results. Tokyo is warmer and partly cloudy, London is cooler and rainy.</think>Here's the current weather: |
|
|
| **Tokyo**: 22°C, Partly cloudy, 65% humidity |
| **London**: 15°C, Rainy, 80% humidity |
|
|
| Tokyo is noticeably warmer today!<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Which city should I visit if I want good weather?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Based on the weather data, Tokyo has better conditions - warmer and not raining.</think>Based on the current conditions, I'd recommend **Tokyo**! It's warmer at 22°C with partly cloudy skies, while London is experiencing rain. Tokyo would be much more pleasant for outdoor activities.<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 30. Full conversation: Code assistant reading and modifying files |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "system", |
| "content": "You are a coding assistant with file system access." |
| }, |
| { |
| "role": "user", |
| "content": "Can you check what's in my config.json file and tell me if there are any issues?" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "User wants me to read config.json and analyze it for issues.", |
| "content": "I'll read the config file for you.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "read_file", |
| "arguments": { |
| "path": "config.json" |
| } |
| } |
| } |
| ] |
| }, |
| { |
| "role": "tool", |
| "content": "{\n \"api_key\": \"sk-12345\",\n \"debug\": true,\n \"port\": \"8080\",\n \"max_connections\": 100\n}" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "I see a few issues: 1) API key is exposed in config (security issue), 2) port is a string but should probably be an integer, 3) debug is true which shouldn't be in production.", |
| "content": "I found a few potential issues in your config.json:\n\n1. **Security Risk**: Your `api_key` is hardcoded in the config file. Consider using environment variables instead.\n2. **Type Issue**: `port` is a string (\"8080\") but should typically be an integer (8080).\n3. **Debug Mode**: `debug` is set to `true`. Make sure to disable this in production.\n\nWould you like me to help fix any of these issues?" |
| }, |
| { |
| "role": "user", |
| "content": "Yes, please fix the port type issue" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "User wants me to fix the port type. I'll write the corrected config.", |
| "content": "I'll update the config file to fix the port type.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "write_file", |
| "arguments": { |
| "path": "config.json", |
| "content": "{\n \"api_key\": \"sk-12345\",\n \"debug\": true,\n \"port\": 8080,\n \"max_connections\": 100\n}" |
| } |
| } |
| } |
| ] |
| }, |
| { |
| "role": "tool", |
| "content": "{\"success\": true, \"message\": \"File written successfully\"}" |
| }, |
| { |
| "role": "assistant", |
| "content": "Done! I've updated `config.json` - the `port` value is now an integer (8080) instead of a string. The other issues (API key exposure and debug mode) are still there if you'd like me to address those as well." |
| } |
| ] |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "read_file", |
| "description": "Read a file", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "path": { |
| "type": "string" |
| } |
| } |
| } |
| } |
| }, |
| { |
| "type": "function", |
| "function": { |
| "name": "write_file", |
| "description": "Write a file", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "path": { |
| "type": "string" |
| }, |
| "content": { |
| "type": "string" |
| } |
| } |
| } |
| } |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| You are a coding assistant with file system access. |
|
|
| You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. |
|
|
| <tools> |
| {"function": {"description": "Read a file", "name": "read_file", "parameters": {"properties": {"path": {"type": "string"}}, "type": "object"}}, "type": "function"} |
| {"function": {"description": "Write a file", "name": "write_file", "parameters": {"properties": {"content": {"type": "string"}, "path": {"type": "string"}}, "type": "object"}}, "type": "function"} |
| </tools> |
|
|
| For each function call return a JSON object, with the following pydantic model json schema: |
|
|
| {"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]} |
|
|
| Each function call should be enclosed within <tool_call> </tool_call> XML tags. |
|
|
| Example: |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "/path/to/file.txt"}} |
| </tool_call> |
| <|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Can you check what's in my config.json file and tell me if there are any issues?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>User wants me to read config.json and analyze it for issues.</think>I'll read the config file for you. |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "config.json"}} |
| </tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|> |
|
|
| <tool_response> |
| { |
| "api_key": "sk-12345", |
| "debug": true, |
| "port": "8080", |
| "max_connections": 100 |
| } |
| </tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>I see a few issues: 1) API key is exposed in config (security issue), 2) port is a string but should probably be an integer, 3) debug is true which shouldn't be in production.</think>I found a few potential issues in your config.json: |
|
|
| 1. **Security Risk**: Your `api_key` is hardcoded in the config file. Consider using environment variables instead. |
| 2. **Type Issue**: `port` is a string ("8080") but should typically be an integer (8080). |
| 3. **Debug Mode**: `debug` is set to `true`. Make sure to disable this in production. |
|
|
| Would you like me to help fix any of these issues?<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Yes, please fix the port type issue<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>User wants me to fix the port type. I'll write the corrected config.</think>I'll update the config file to fix the port type. |
| <tool_call> |
| {"name": "write_file", "arguments": {"content": "{\n \"api_key\": \"sk-12345\",\n \"debug\": true,\n \"port\": 8080,\n \"max_connections\": 100\n}", "path": "config.json"}} |
| </tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|> |
|
|
| <tool_response> |
| {"success": true, "message": "File written successfully"} |
| </tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| Done! I've updated `config.json` - the `port` value is now an integer (8080) instead of a string. The other issues (API key exposure and debug mode) are still there if you'd like me to address those as well.<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 31. Full conversation: Math tutor with step-by-step reasoning |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "system", |
| "content": "You are a patient math tutor. Show your reasoning step by step." |
| }, |
| { |
| "role": "user", |
| "content": "How do I solve the quadratic equation x² + 5x + 6 = 0?" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "This is a quadratic equation. I can solve it by factoring since the coefficients are small integers. I need two numbers that multiply to 6 and add to 5. Those numbers are 2 and 3: 2 × 3 = 6, 2 + 3 = 5.", |
| "content": "Great question! Let's solve x² + 5x + 6 = 0 by factoring.\n\n**Step 1**: We need two numbers that:\n- Multiply to give 6 (the constant term)\n- Add to give 5 (the coefficient of x)\n\n**Step 2**: Those numbers are 2 and 3 because:\n- 2 × 3 = 6 ✓\n- 2 + 3 = 5 ✓\n\n**Step 3**: Factor the equation:\nx² + 5x + 6 = (x + 2)(x + 3) = 0\n\n**Step 4**: Set each factor equal to zero:\n- x + 2 = 0 → x = -2\n- x + 3 = 0 → x = -3\n\n**Solution**: x = -2 or x = -3" |
| }, |
| { |
| "role": "user", |
| "content": "Can you verify those answers are correct?" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "I'll substitute each solution back into the original equation to verify. For x = -2: (-2)² + 5(-2) + 6 = 4 - 10 + 6 = 0 ✓. For x = -3: (-3)² + 5(-3) + 6 = 9 - 15 + 6 = 0 ✓. Both check out!", |
| "content": "Absolutely! Let's verify by substituting each answer back into the original equation:\n\n**For x = -2:**\n(-2)² + 5(-2) + 6 = 4 - 10 + 6 = 0 ✓\n\n**For x = -3:**\n(-3)² + 5(-3) + 6 = 9 - 15 + 6 = 0 ✓\n\nBoth solutions check out! This confirms x = -2 and x = -3 are correct." |
| }, |
| { |
| "role": "user", |
| "content": "What if the equation can't be factored easily?" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Good follow-up question. When factoring is difficult, we use the quadratic formula: x = (-b ± √(b²-4ac)) / 2a. I should explain this method.", |
| "content": "Great question! When factoring isn't obvious, use the **quadratic formula**:\n\nFor ax² + bx + c = 0:\n\n**x = (-b ± √(b² - 4ac)) / 2a**\n\nUsing our equation (a=1, b=5, c=6):\n- x = (-5 ± √(25 - 24)) / 2\n- x = (-5 ± √1) / 2\n- x = (-5 ± 1) / 2\n\nThis gives us:\n- x = (-5 + 1) / 2 = -2\n- x = (-5 - 1) / 2 = -3\n\nSame answers! The quadratic formula always works, even when the roots aren't nice integers." |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
|
|
| You are a patient math tutor. Show your reasoning step by step.<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| How do I solve the quadratic equation x² + 5x + 6 = 0?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>This is a quadratic equation. I can solve it by factoring since the coefficients are small integers. I need two numbers that multiply to 6 and add to 5. Those numbers are 2 and 3: 2 × 3 = 6, 2 + 3 = 5.</think>Great question! Let's solve x² + 5x + 6 = 0 by factoring. |
|
|
| **Step 1**: We need two numbers that: |
| - Multiply to give 6 (the constant term) |
| - Add to give 5 (the coefficient of x) |
|
|
| **Step 2**: Those numbers are 2 and 3 because: |
| - 2 × 3 = 6 ✓ |
| - 2 + 3 = 5 ✓ |
|
|
| **Step 3**: Factor the equation: |
| x² + 5x + 6 = (x + 2)(x + 3) = 0 |
|
|
| **Step 4**: Set each factor equal to zero: |
| - x + 2 = 0 → x = -2 |
| - x + 3 = 0 → x = -3 |
|
|
| **Solution**: x = -2 or x = -3<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Can you verify those answers are correct?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>I'll substitute each solution back into the original equation to verify. For x = -2: (-2)² + 5(-2) + 6 = 4 - 10 + 6 = 0 ✓. For x = -3: (-3)² + 5(-3) + 6 = 9 - 15 + 6 = 0 ✓. Both check out!</think>Absolutely! Let's verify by substituting each answer back into the original equation: |
|
|
| **For x = -2:** |
| (-2)² + 5(-2) + 6 = 4 - 10 + 6 = 0 ✓ |
|
|
| **For x = -3:** |
| (-3)² + 5(-3) + 6 = 9 - 15 + 6 = 0 ✓ |
|
|
| Both solutions check out! This confirms x = -2 and x = -3 are correct.<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| What if the equation can't be factored easily?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Good follow-up question. When factoring is difficult, we use the quadratic formula: x = (-b ± √(b²-4ac)) / 2a. I should explain this method.</think>Great question! When factoring isn't obvious, use the **quadratic formula**: |
|
|
| For ax² + bx + c = 0: |
|
|
| **x = (-b ± √(b² - 4ac)) / 2a** |
|
|
| Using our equation (a=1, b=5, c=6): |
| - x = (-5 ± √(25 - 24)) / 2 |
| - x = (-5 ± √1) / 2 |
| - x = (-5 ± 1) / 2 |
|
|
| This gives us: |
| - x = (-5 + 1) / 2 = -2 |
| - x = (-5 - 1) / 2 = -3 |
|
|
| Same answers! The quadratic formula always works, even when the roots aren't nice integers.<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 32. Full conversation: Research assistant with web search and calculations |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "system", |
| "content": "You are a research assistant with access to web search and a calculator." |
| }, |
| { |
| "role": "user", |
| "content": "What's the population of France and Germany, and what's the total combined?" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "User wants population data for two countries and a sum. I'll search for each country's population.", |
| "content": "Let me look that up for you.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "web_search", |
| "arguments": { |
| "query": "France population 2024" |
| } |
| } |
| }, |
| { |
| "function": { |
| "name": "web_search", |
| "arguments": { |
| "query": "Germany population 2024" |
| } |
| } |
| } |
| ] |
| }, |
| { |
| "role": "tool", |
| "content": "{\"result\": \"France population 2024: approximately 68.2 million\"}" |
| }, |
| { |
| "role": "tool", |
| "content": "{\"result\": \"Germany population 2024: approximately 84.5 million\"}" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Got the populations. France: 68.2M, Germany: 84.5M. Now I need to calculate the sum.", |
| "content": "I found the populations. Let me calculate the total.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "calculator", |
| "arguments": { |
| "expression": "68.2 + 84.5" |
| } |
| } |
| } |
| ] |
| }, |
| { |
| "role": "tool", |
| "content": "{\"result\": 152.7}" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Sum is 152.7 million. Germany has about 16 million more people than France.", |
| "content": "Here are the population figures:\n\n| Country | Population (2024) |\n|---------|------------------|\n| France | ~68.2 million |\n| Germany | ~84.5 million |\n| **Combined** | **~152.7 million** |\n\nGermany has about 16.3 million more people than France, making it the most populous country in the European Union." |
| }, |
| { |
| "role": "user", |
| "content": "What percentage of the EU population is that?" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Need to find EU total population and calculate the percentage.", |
| "content": "Good question! Let me find the EU total population.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "web_search", |
| "arguments": { |
| "query": "European Union total population 2024" |
| } |
| } |
| } |
| ] |
| }, |
| { |
| "role": "tool", |
| "content": "{\"result\": \"EU population 2024: approximately 448.4 million\"}" |
| }, |
| { |
| "role": "assistant", |
| "content": "I'll calculate the percentage.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "calculator", |
| "arguments": { |
| "expression": "(152.7 / 448.4) * 100" |
| } |
| } |
| } |
| ] |
| }, |
| { |
| "role": "tool", |
| "content": "{\"result\": 34.05}" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "152.7 million out of 448.4 million is about 34%. That's over a third of the EU population in just two countries.", |
| "content": "France and Germany together represent approximately **34%** of the EU's total population (152.7 million out of 448.4 million).\n\nThat's over one-third of the entire European Union in just two countries! This helps explain their significant political and economic influence within the EU." |
| } |
| ] |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "web_search", |
| "description": "Search the web", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "query": { |
| "type": "string" |
| } |
| } |
| } |
| } |
| }, |
| { |
| "type": "function", |
| "function": { |
| "name": "calculator", |
| "description": "Evaluate math expression", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "expression": { |
| "type": "string" |
| } |
| } |
| } |
| } |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| You are a research assistant with access to web search and a calculator. |
|
|
| You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. |
|
|
| <tools> |
| {"function": {"description": "Search the web", "name": "web_search", "parameters": {"properties": {"query": {"type": "string"}}, "type": "object"}}, "type": "function"} |
| {"function": {"description": "Evaluate math expression", "name": "calculator", "parameters": {"properties": {"expression": {"type": "string"}}, "type": "object"}}, "type": "function"} |
| </tools> |
|
|
| For each function call return a JSON object, with the following pydantic model json schema: |
|
|
| {"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]} |
|
|
| Each function call should be enclosed within <tool_call> </tool_call> XML tags. |
|
|
| Example: |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "/path/to/file.txt"}} |
| </tool_call> |
| <|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| What's the population of France and Germany, and what's the total combined?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>User wants population data for two countries and a sum. I'll search for each country's population.</think>Let me look that up for you. |
| <tool_call> |
| {"name": "web_search", "arguments": {"query": "France population 2024"}} |
| </tool_call> |
| <tool_call> |
| {"name": "web_search", "arguments": {"query": "Germany population 2024"}} |
| </tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|> |
|
|
| <tool_response> |
| {"result": "France population 2024: approximately 68.2 million"} |
| </tool_response> |
| <tool_response> |
| {"result": "Germany population 2024: approximately 84.5 million"} |
| </tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Got the populations. France: 68.2M, Germany: 84.5M. Now I need to calculate the sum.</think>I found the populations. Let me calculate the total. |
| <tool_call> |
| {"name": "calculator", "arguments": {"expression": "68.2 + 84.5"}} |
| </tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|> |
|
|
| <tool_response> |
| {"result": 152.7} |
| </tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Sum is 152.7 million. Germany has about 16 million more people than France.</think>Here are the population figures: |
|
|
| | Country | Population (2024) | |
| |---------|------------------| |
| | France | ~68.2 million | |
| | Germany | ~84.5 million | |
| | **Combined** | **~152.7 million** | |
|
|
| Germany has about 16.3 million more people than France, making it the most populous country in the European Union.<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| What percentage of the EU population is that?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Need to find EU total population and calculate the percentage.</think>Good question! Let me find the EU total population. |
| <tool_call> |
| {"name": "web_search", "arguments": {"query": "European Union total population 2024"}} |
| </tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|> |
|
|
| <tool_response> |
| {"result": "EU population 2024: approximately 448.4 million"} |
| </tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| I'll calculate the percentage. |
| <tool_call> |
| {"name": "calculator", "arguments": {"expression": "(152.7 / 448.4) * 100"}} |
| </tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|> |
|
|
| <tool_response> |
| {"result": 34.05} |
| </tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>152.7 million out of 448.4 million is about 34%. That's over a third of the EU population in just two countries.</think>France and Germany together represent approximately **34%** of the EU's total population (152.7 million out of 448.4 million). |
|
|
| That's over one-third of the entire European Union in just two countries! This helps explain their significant political and economic influence within the EU.<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 33. Full conversation: Customer service bot handling an order issue |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "system", |
| "content": "You are a helpful customer service agent for TechStore. Be empathetic and solution-oriented." |
| }, |
| { |
| "role": "user", |
| "content": "Hi, my order #12345 hasn't arrived and it's been 2 weeks!" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Customer is frustrated about delayed order. I should empathize first, then look up their order.", |
| "content": "I'm so sorry to hear about this delay! Two weeks is definitely longer than expected. Let me look up your order right away.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "lookup_order", |
| "arguments": { |
| "order_id": "12345" |
| } |
| } |
| } |
| ] |
| }, |
| { |
| "role": "tool", |
| "content": "{\"order_id\": \"12345\", \"status\": \"shipped\", \"shipped_date\": \"2024-01-10\", \"carrier\": \"FastShip\", \"tracking\": \"FS789456123\", \"items\": [\"Wireless Headphones\"], \"delivery_estimate\": \"2024-01-15\"}" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Order was shipped on Jan 10 with estimated delivery Jan 15. It's clearly late. I should check the tracking status.", |
| "content": "I found your order. It was shipped on January 10th with FastShip (tracking: FS789456123) and was estimated to arrive by January 15th. Let me check the current tracking status.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "track_shipment", |
| "arguments": { |
| "carrier": "FastShip", |
| "tracking_number": "FS789456123" |
| } |
| } |
| } |
| ] |
| }, |
| { |
| "role": "tool", |
| "content": "{\"status\": \"delayed\", \"last_location\": \"Regional Distribution Center\", \"last_update\": \"2024-01-18\", \"issue\": \"Weather delay - severe storms\"}" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Package is delayed due to weather at distribution center. Last update was 6 days ago. This is a carrier issue but I should offer compensation.", |
| "content": "I see the issue now. Your package is currently at a Regional Distribution Center and has been delayed due to severe weather conditions. The last update was on January 18th.\n\nI sincerely apologize for this inconvenience. Here's what I can do:\n\n1. I'll flag this shipment as priority with FastShip\n2. I'd like to offer you a **15% refund** for the delay\n\nWould you like me to proceed with both of these actions?" |
| }, |
| { |
| "role": "user", |
| "content": "Yes please, and can you also send me updates by email?" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Customer agreed. I need to: 1) Flag shipment as priority, 2) Process 15% refund, 3) Set up email notifications.", |
| "content": "Absolutely! Let me take care of all of that for you.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "flag_priority", |
| "arguments": { |
| "tracking_number": "FS789456123" |
| } |
| } |
| }, |
| { |
| "function": { |
| "name": "process_refund", |
| "arguments": { |
| "order_id": "12345", |
| "percentage": 15 |
| } |
| } |
| }, |
| { |
| "function": { |
| "name": "setup_notifications", |
| "arguments": { |
| "order_id": "12345", |
| "email": true |
| } |
| } |
| } |
| ] |
| }, |
| { |
| "role": "tool", |
| "content": "{\"success\": true, \"message\": \"Shipment flagged as priority\"}" |
| }, |
| { |
| "role": "tool", |
| "content": "{\"success\": true, \"refund_amount\": \"$11.25\", \"message\": \"Refund processed\"}" |
| }, |
| { |
| "role": "tool", |
| "content": "{\"success\": true, \"message\": \"Email notifications enabled\"}" |
| }, |
| { |
| "role": "assistant", |
| "content": "All done! Here's a summary of what I've done:\n\n✅ **Priority Flag**: Your shipment is now flagged as priority with FastShip\n✅ **Refund**: $11.25 (15%) has been refunded to your original payment method\n✅ **Email Updates**: You'll receive tracking updates via email\n\nIs there anything else I can help you with?" |
| }, |
| { |
| "role": "user", |
| "content": "No that's great, thank you so much!" |
| }, |
| { |
| "role": "assistant", |
| "content": "You're very welcome! I hope your headphones arrive soon. If you have any other questions or concerns, don't hesitate to reach out. Have a wonderful day! 😊" |
| } |
| ] |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "lookup_order", |
| "description": "Look up order details", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "order_id": { |
| "type": "string" |
| } |
| } |
| } |
| } |
| }, |
| { |
| "type": "function", |
| "function": { |
| "name": "track_shipment", |
| "description": "Track a shipment", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "carrier": { |
| "type": "string" |
| }, |
| "tracking_number": { |
| "type": "string" |
| } |
| } |
| } |
| } |
| }, |
| { |
| "type": "function", |
| "function": { |
| "name": "flag_priority", |
| "description": "Flag shipment as priority", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "tracking_number": { |
| "type": "string" |
| } |
| } |
| } |
| } |
| }, |
| { |
| "type": "function", |
| "function": { |
| "name": "process_refund", |
| "description": "Process a refund", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "order_id": { |
| "type": "string" |
| }, |
| "percentage": { |
| "type": "number" |
| } |
| } |
| } |
| } |
| }, |
| { |
| "type": "function", |
| "function": { |
| "name": "setup_notifications", |
| "description": "Setup notifications", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "order_id": { |
| "type": "string" |
| }, |
| "email": { |
| "type": "boolean" |
| } |
| } |
| } |
| } |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| You are a helpful customer service agent for TechStore. Be empathetic and solution-oriented. |
|
|
| You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. |
|
|
| <tools> |
| {"function": {"description": "Look up order details", "name": "lookup_order", "parameters": {"properties": {"order_id": {"type": "string"}}, "type": "object"}}, "type": "function"} |
| {"function": {"description": "Track a shipment", "name": "track_shipment", "parameters": {"properties": {"carrier": {"type": "string"}, "tracking_number": {"type": "string"}}, "type": "object"}}, "type": "function"} |
| {"function": {"description": "Flag shipment as priority", "name": "flag_priority", "parameters": {"properties": {"tracking_number": {"type": "string"}}, "type": "object"}}, "type": "function"} |
| {"function": {"description": "Process a refund", "name": "process_refund", "parameters": {"properties": {"order_id": {"type": "string"}, "percentage": {"type": "number"}}, "type": "object"}}, "type": "function"} |
| {"function": {"description": "Setup notifications", "name": "setup_notifications", "parameters": {"properties": {"email": {"type": "boolean"}, "order_id": {"type": "string"}}, "type": "object"}}, "type": "function"} |
| </tools> |
|
|
| For each function call return a JSON object, with the following pydantic model json schema: |
|
|
| {"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]} |
|
|
| Each function call should be enclosed within <tool_call> </tool_call> XML tags. |
|
|
| Example: |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "/path/to/file.txt"}} |
| </tool_call> |
| <|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Hi, my order #12345 hasn't arrived and it's been 2 weeks!<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Customer is frustrated about delayed order. I should empathize first, then look up their order.</think>I'm so sorry to hear about this delay! Two weeks is definitely longer than expected. Let me look up your order right away. |
| <tool_call> |
| {"name": "lookup_order", "arguments": {"order_id": "12345"}} |
| </tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|> |
|
|
| <tool_response> |
| {"order_id": "12345", "status": "shipped", "shipped_date": "2024-01-10", "carrier": "FastShip", "tracking": "FS789456123", "items": ["Wireless Headphones"], "delivery_estimate": "2024-01-15"} |
| </tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Order was shipped on Jan 10 with estimated delivery Jan 15. It's clearly late. I should check the tracking status.</think>I found your order. It was shipped on January 10th with FastShip (tracking: FS789456123) and was estimated to arrive by January 15th. Let me check the current tracking status. |
| <tool_call> |
| {"name": "track_shipment", "arguments": {"carrier": "FastShip", "tracking_number": "FS789456123"}} |
| </tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|> |
|
|
| <tool_response> |
| {"status": "delayed", "last_location": "Regional Distribution Center", "last_update": "2024-01-18", "issue": "Weather delay - severe storms"} |
| </tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Package is delayed due to weather at distribution center. Last update was 6 days ago. This is a carrier issue but I should offer compensation.</think>I see the issue now. Your package is currently at a Regional Distribution Center and has been delayed due to severe weather conditions. The last update was on January 18th. |
|
|
| I sincerely apologize for this inconvenience. Here's what I can do: |
|
|
| 1. I'll flag this shipment as priority with FastShip |
| 2. I'd like to offer you a **15% refund** for the delay |
|
|
| Would you like me to proceed with both of these actions?<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Yes please, and can you also send me updates by email?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Customer agreed. I need to: 1) Flag shipment as priority, 2) Process 15% refund, 3) Set up email notifications.</think>Absolutely! Let me take care of all of that for you. |
| <tool_call> |
| {"name": "flag_priority", "arguments": {"tracking_number": "FS789456123"}} |
| </tool_call> |
| <tool_call> |
| {"name": "process_refund", "arguments": {"order_id": "12345", "percentage": 15}} |
| </tool_call> |
| <tool_call> |
| {"name": "setup_notifications", "arguments": {"email": true, "order_id": "12345"}} |
| </tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|> |
|
|
| <tool_response> |
| {"success": true, "message": "Shipment flagged as priority"} |
| </tool_response> |
| <tool_response> |
| {"success": true, "refund_amount": "$11.25", "message": "Refund processed"} |
| </tool_response> |
| <tool_response> |
| {"success": true, "message": "Email notifications enabled"} |
| </tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| All done! Here's a summary of what I've done: |
|
|
| ✅ **Priority Flag**: Your shipment is now flagged as priority with FastShip |
| ✅ **Refund**: $11.25 (15%) has been refunded to your original payment method |
| ✅ **Email Updates**: You'll receive tracking updates via email |
|
|
| Is there anything else I can help you with?<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| No that's great, thank you so much!<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| You're very welcome! I hope your headphones arrive soon. If you have any other questions or concerns, don't hesitate to reach out. Have a wonderful day! 😊<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 34. Full conversation: No tools, multi-turn creative writing with thinking |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "system", |
| "content": "You are a creative writing assistant. Think through your creative choices." |
| }, |
| { |
| "role": "user", |
| "content": "Help me write the opening line of a mystery novel set in Victorian London." |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Victorian London mystery - I should evoke fog, gas lamps, cobblestones. The opening should hook the reader with intrigue. Maybe start with an unusual discovery or a sense of foreboding.", |
| "content": "Here's an opening line:\n\n*\"The fog had swallowed three souls that week in Whitechapel, and as Inspector Blackwood struck a match against the damp brick wall, he suspected the fourth lay crumpled at his feet.\"*\n\nThis establishes: Victorian setting (fog, Whitechapel), mystery/danger (swallowed souls), protagonist (Inspector Blackwood), and immediate tension (a body). Would you like me to try a different tone or approach?" |
| }, |
| { |
| "role": "user", |
| "content": "I like it! Can you write the next paragraph?" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Good, they liked the noir tone. The next paragraph should describe the scene, introduce sensory details, and perhaps hint at what makes this death unusual. I'll describe what the inspector sees and hint at why this case is different.", |
| "content": "Here's the next paragraph:\n\n*The match flame danced twice before the Thames wind extinguished it, but those two heartbeats of light were enough. A woman—young, judging by her unlined hands—lay face-down on the cobblestones, her emerald dress pooling around her like spilled ink in the darkness. No blood, Blackwood noted. No visible wound. Yet death had found her all the same, and the single white rose clutched in her rigid fingers told him this was no accident. Someone had wanted her found, and found precisely like this.*\n\nThis deepens the mystery with the unusual details (no wounds, the white rose, the deliberate positioning). Want me to continue?" |
| }, |
| { |
| "role": "user", |
| "content": "Perfect! One more paragraph introducing a clue." |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Need to introduce a clue that will drive the investigation. Something small but significant. A calling card? A cryptic note? Something that connects to a larger conspiracy would be good. I'll go with a small but intriguing physical clue.", |
| "content": "Here's the next paragraph with a key clue:\n\n*Blackwood knelt, his knees protesting against the cold stone, and gently turned the rose. There, tucked beneath the thorns where only someone looking would find it, was a small brass key—no larger than his thumbnail—engraved with a serpent devouring its own tail. He'd seen that symbol before, carved into the mantle of Lord Ashworth's study during the Pemberton affair, a case the Crown had ordered buried. The dead woman had just connected herself to the most dangerous secret in London, and Blackwood suspected he was no longer the only one standing in the fog.*\n\nThe serpent key connects to a past case and larger conspiracy, while the final line suggests he's being watched. This gives the story momentum!" |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
|
|
| You are a creative writing assistant. Think through your creative choices.<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Help me write the opening line of a mystery novel set in Victorian London.<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Victorian London mystery - I should evoke fog, gas lamps, cobblestones. The opening should hook the reader with intrigue. Maybe start with an unusual discovery or a sense of foreboding.</think>Here's an opening line: |
|
|
| *"The fog had swallowed three souls that week in Whitechapel, and as Inspector Blackwood struck a match against the damp brick wall, he suspected the fourth lay crumpled at his feet."* |
|
|
| This establishes: Victorian setting (fog, Whitechapel), mystery/danger (swallowed souls), protagonist (Inspector Blackwood), and immediate tension (a body). Would you like me to try a different tone or approach?<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| I like it! Can you write the next paragraph?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Good, they liked the noir tone. The next paragraph should describe the scene, introduce sensory details, and perhaps hint at what makes this death unusual. I'll describe what the inspector sees and hint at why this case is different.</think>Here's the next paragraph: |
|
|
| *The match flame danced twice before the Thames wind extinguished it, but those two heartbeats of light were enough. A woman—young, judging by her unlined hands—lay face-down on the cobblestones, her emerald dress pooling around her like spilled ink in the darkness. No blood, Blackwood noted. No visible wound. Yet death had found her all the same, and the single white rose clutched in her rigid fingers told him this was no accident. Someone had wanted her found, and found precisely like this.* |
|
|
| This deepens the mystery with the unusual details (no wounds, the white rose, the deliberate positioning). Want me to continue?<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Perfect! One more paragraph introducing a clue.<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Need to introduce a clue that will drive the investigation. Something small but significant. A calling card? A cryptic note? Something that connects to a larger conspiracy would be good. I'll go with a small but intriguing physical clue.</think>Here's the next paragraph with a key clue: |
|
|
| *Blackwood knelt, his knees protesting against the cold stone, and gently turned the rose. There, tucked beneath the thorns where only someone looking would find it, was a small brass key—no larger than his thumbnail—engraved with a serpent devouring its own tail. He'd seen that symbol before, carved into the mantle of Lord Ashworth's study during the Pemberton affair, a case the Crown had ordered buried. The dead woman had just connected herself to the most dangerous secret in London, and Blackwood suspected he was no longer the only one standing in the fog.* |
|
|
| The serpent key connects to a past case and larger conspiracy, while the final line suggests he's being watched. This gives the story momentum!<|eot_id|> |
| ``` |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: 35. Full conversation: Database operations with multiple queries |
| -------------------------------------------------------------------------------- |
|
|
| INPUT: |
| messages = [ |
| { |
| "role": "system", |
| "content": "You are a database administrator assistant with SQL access." |
| }, |
| { |
| "role": "user", |
| "content": "I need to find all customers who made purchases over $1000 last month and haven't been contacted by sales yet." |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Need to join customers with orders, filter by amount > 1000 and date range, then exclude those who have been contacted. I'll need to query the database.", |
| "content": "I'll query the database to find those customers for you.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "execute_sql", |
| "arguments": { |
| "query": "SELECT c.customer_id, c.name, c.email, SUM(o.amount) as total_spent FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.order_date >= DATE_SUB(CURDATE(), INTERVAL 1 MONTH) AND c.customer_id NOT IN (SELECT customer_id FROM sales_contacts WHERE contact_date >= DATE_SUB(CURDATE(), INTERVAL 1 MONTH)) GROUP BY c.customer_id HAVING total_spent > 1000 ORDER BY total_spent DESC" |
| } |
| } |
| } |
| ] |
| }, |
| { |
| "role": "tool", |
| "content": "{\"rows\": [{\"customer_id\": 1042, \"name\": \"Alice Chen\", \"email\": \"alice@example.com\", \"total_spent\": 2450.00}, {\"customer_id\": 987, \"name\": \"Bob Martinez\", \"email\": \"bob@example.com\", \"total_spent\": 1875.50}, {\"customer_id\": 1156, \"name\": \"Carol White\", \"email\": \"carol@example.com\", \"total_spent\": 1230.00}], \"row_count\": 3}" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Found 3 customers. Alice spent the most at $2450. I should present this clearly and offer to help with follow-up actions.", |
| "content": "I found **3 customers** who meet your criteria:\n\n| Customer | Email | Total Spent |\n|----------|-------|-------------|\n| Alice Chen | alice@example.com | $2,450.00 |\n| Bob Martinez | bob@example.com | $1,875.50 |\n| Carol White | carol@example.com | $1,230.00 |\n\nWould you like me to:\n1. Create a sales contact task for each of them?\n2. Export this list to a CSV?\n3. Get more details about their purchase history?" |
| }, |
| { |
| "role": "user", |
| "content": "Yes, create sales contact tasks for all three" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Need to insert three records into the sales_contacts or tasks table.", |
| "content": "Creating sales contact tasks for all three customers.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "execute_sql", |
| "arguments": { |
| "query": "INSERT INTO sales_tasks (customer_id, task_type, priority, created_date, notes) VALUES (1042, 'follow_up', 'high', CURDATE(), 'High-value customer - $2450 spent last month'), (987, 'follow_up', 'high', CURDATE(), 'High-value customer - $1875.50 spent last month'), (1156, 'follow_up', 'medium', CURDATE(), 'High-value customer - $1230 spent last month')" |
| } |
| } |
| } |
| ] |
| }, |
| { |
| "role": "tool", |
| "content": "{\"affected_rows\": 3, \"message\": \"3 rows inserted successfully\"}" |
| }, |
| { |
| "role": "assistant", |
| "content": "Done! I've created sales contact tasks for all three customers:\n\n- ✅ **Alice Chen** - High priority (highest spender)\n- ✅ **Bob Martinez** - High priority\n- ✅ **Carol White** - Medium priority\n\nThe tasks are now in the `sales_tasks` table and should appear in your team's task queue. Would you like me to assign these to specific sales reps?" |
| } |
| ] |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "execute_sql", |
| "description": "Execute SQL query", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "query": { |
| "type": "string" |
| } |
| } |
| } |
| } |
| } |
| ] |
| add_generation_prompt = False |
| keep_thinking = True |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| You are a database administrator assistant with SQL access. |
|
|
| You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags. |
|
|
| <tools> |
| {"function": {"description": "Execute SQL query", "name": "execute_sql", "parameters": {"properties": {"query": {"type": "string"}}, "type": "object"}}, "type": "function"} |
| </tools> |
|
|
| For each function call return a JSON object, with the following pydantic model json schema: |
|
|
| {"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]} |
|
|
| Each function call should be enclosed within <tool_call> </tool_call> XML tags. |
|
|
| Example: |
| <tool_call> |
| {"name": "read_file", "arguments": {"path": "/path/to/file.txt"}} |
| </tool_call> |
| <|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| I need to find all customers who made purchases over $1000 last month and haven't been contacted by sales yet.<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Need to join customers with orders, filter by amount > 1000 and date range, then exclude those who have been contacted. I'll need to query the database.</think>I'll query the database to find those customers for you. |
| <tool_call> |
| {"name": "execute_sql", "arguments": {"query": "SELECT c.customer_id, c.name, c.email, SUM(o.amount) as total_spent FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.order_date \u003e= DATE_SUB(CURDATE(), INTERVAL 1 MONTH) AND c.customer_id NOT IN (SELECT customer_id FROM sales_contacts WHERE contact_date \u003e= DATE_SUB(CURDATE(), INTERVAL 1 MONTH)) GROUP BY c.customer_id HAVING total_spent \u003e 1000 ORDER BY total_spent DESC"}} |
| </tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|> |
|
|
| <tool_response> |
| {"rows": [{"customer_id": 1042, "name": "Alice Chen", "email": "alice@example.com", "total_spent": 2450.00}, {"customer_id": 987, "name": "Bob Martinez", "email": "bob@example.com", "total_spent": 1875.50}, {"customer_id": 1156, "name": "Carol White", "email": "carol@example.com", "total_spent": 1230.00}], "row_count": 3} |
| </tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Found 3 customers. Alice spent the most at $2450. I should present this clearly and offer to help with follow-up actions.</think>I found **3 customers** who meet your criteria: |
|
|
| | Customer | Email | Total Spent | |
| |----------|-------|-------------| |
| | Alice Chen | alice@example.com | $2,450.00 | |
| | Bob Martinez | bob@example.com | $1,875.50 | |
| | Carol White | carol@example.com | $1,230.00 | |
|
|
| Would you like me to: |
| 1. Create a sales contact task for each of them? |
| 2. Export this list to a CSV? |
| 3. Get more details about their purchase history?<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| Yes, create sales contact tasks for all three<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Need to insert three records into the sales_contacts or tasks table.</think>Creating sales contact tasks for all three customers. |
| <tool_call> |
| {"name": "execute_sql", "arguments": {"query": "INSERT INTO sales_tasks (customer_id, task_type, priority, created_date, notes) VALUES (1042, \u0027follow_up\u0027, \u0027high\u0027, CURDATE(), \u0027High-value customer - $2450 spent last month\u0027), (987, \u0027follow_up\u0027, \u0027high\u0027, CURDATE(), \u0027High-value customer - $1875.50 spent last month\u0027), (1156, \u0027follow_up\u0027, \u0027medium\u0027, CURDATE(), \u0027High-value customer - $1230 spent last month\u0027)"}} |
| </tool_call><|eot_id|><|start_header_id|>tool<|end_header_id|> |
|
|
| <tool_response> |
| {"affected_rows": 3, "message": "3 rows inserted successfully"} |
| </tool_response><|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| Done! I've created sales contact tasks for all three customers: |
|
|
| - ✅ **Alice Chen** - High priority (highest spender) |
| - ✅ **Bob Martinez** - High priority |
| - ✅ **Carol White** - Medium priority |
|
|
| The tasks are now in the `sales_tasks` table and should appear in your team's task queue. Would you like me to assign these to specific sales reps?<|eot_id|> |
| ``` |
|
|
|
|
|
|
|
|
| ================================================================================ |
| INVERSE TESTS - Parsing Raw Model Output |
| ================================================================================ |
|
|
| These tests verify that raw model output can be parsed back into |
| structured messages with separate reasoning_content and content fields. |
|
|
| -------------------------------------------------------------------------------- |
| TEST: I1. Simple response without thinking |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| The answer is 42. |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "content": "The answer is 42." |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I2. Response with thinking (no newlines in tags) |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>Let me calculate this carefully.</think>The answer is 42. |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "Let me calculate this carefully.", |
| "content": "The answer is 42." |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I3. Response with thinking (with newlines inside) |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>First, I need to consider the problem. |
| Then I'll work through it step by step. |
| The calculation shows 42.</think>The answer is 42. |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "First, I need to consider the problem.\nThen I'll work through it step by step.\nThe calculation shows 42.", |
| "content": "The answer is 42." |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I4. Long thinking with detailed reasoning |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>The user is asking about the square root of 144. |
| I know that: |
| - 10 * 10 = 100 |
| - 11 * 11 = 121 |
| - 12 * 12 = 144 |
|
|
| So the square root of 144 is 12.</think>The square root of 144 is **12**. |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "The user is asking about the square root of 144.\nI know that:\n- 10 * 10 = 100\n- 11 * 11 = 121 \n- 12 * 12 = 144\n\nSo the square root of 144 is 12.", |
| "content": "The square root of 144 is **12**." |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I5. Response with tool call (no thinking) |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| I'll check the weather for you. |
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "Tokyo"}} |
| </tool_call> |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "content": "I'll check the weather for you.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "Tokyo" |
| } |
| } |
| } |
| ] |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I6. Response with thinking AND tool call |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>The user wants to know the weather in Paris. I should use the get_weather function.</think>Let me check that for you. |
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "Paris"}} |
| </tool_call> |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "The user wants to know the weather in Paris. I should use the get_weather function.", |
| "content": "Let me check that for you.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "Paris" |
| } |
| } |
| } |
| ] |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I7. Response with multiple tool calls |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>The user wants weather for two cities. I'll call the function twice.</think>I'll check both locations. |
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "Tokyo"}} |
| </tool_call> |
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "London"}} |
| </tool_call> |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "The user wants weather for two cities. I'll call the function twice.", |
| "content": "I'll check both locations.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "Tokyo" |
| } |
| } |
| }, |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "London" |
| } |
| } |
| } |
| ] |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I8. Empty thinking block |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think></think>Here's my response. |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "", |
| "content": "Here's my response." |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I9. Only thinking, no visible response |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>I'm just thinking here, no response needed.</think> |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "I'm just thinking here, no response needed.", |
| "content": "" |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I10. Response with code blocks (shouldn't confuse parser) |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>The user wants a Python example.</think>Here's the code: |
|
|
| ```python |
| def hello(): |
| print("<hello>") # Note the angle brackets |
| ``` |
|
|
| This prints a greeting. |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "The user wants a Python example.", |
| "content": "Here's the code:\n\n```python\ndef hello():\n print(\"<hello>\") # Note the angle brackets\n```\n\nThis prints a greeting." |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I11. Response with nested angle brackets |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>Need to show HTML tags</think>Use `<div>` and `</div>` for containers. |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "Need to show HTML tags", |
| "content": "Use `<div>` and `</div>` for containers." |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I12. Unicode in thinking |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>ユーザーは日本語で挨拶を求めています。</think>こんにちは! (Hello in Japanese) |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "ユーザーは日本語で挨拶を求めています。", |
| "content": "こんにちは! (Hello in Japanese)" |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I13. Multiline content after thinking |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>Let me format this nicely.</think>Here are the steps: |
|
|
| 1. First step |
| 2. Second step |
| 3. Third step |
|
|
| That's all! |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "Let me format this nicely.", |
| "content": "Here are the steps:\n\n1. First step\n2. Second step\n3. Third step\n\nThat's all!" |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I14. Whitespace variations around think tags |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think> Some thinking </think> Some content |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "Some thinking", |
| "content": "Some content" |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I15. Full assistant turn: thinking + content + single tool call |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>The user wants weather data. I need to call the get_weather function with the location parameter.</think>Let me check the weather for you. |
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "San Francisco", "unit": "fahrenheit"}} |
| </tool_call> |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "The user wants weather data. I need to call the get_weather function with the location parameter.", |
| "content": "Let me check the weather for you.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "San Francisco", |
| "unit": "fahrenheit" |
| } |
| } |
| } |
| ] |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I16. Full assistant turn: thinking + content + multiple tool calls |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>The user wants to compare weather in two cities. I'll call get_weather twice with different locations.</think>I'll check both cities for you. |
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "New York"}} |
| </tool_call> |
| <tool_call> |
| {"name": "get_weather", "arguments": {"location": "Los Angeles"}} |
| </tool_call> |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "The user wants to compare weather in two cities. I'll call get_weather twice with different locations.", |
| "content": "I'll check both cities for you.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "New York" |
| } |
| } |
| }, |
| { |
| "function": { |
| "name": "get_weather", |
| "arguments": { |
| "location": "Los Angeles" |
| } |
| } |
| } |
| ] |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I17. Full assistant turn: long reasoning with detailed steps |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>Let me analyze this step by step: |
|
|
| 1. First, I need to understand the user's request |
| 2. They want to calculate compound interest |
| 3. The formula is A = P(1 + r/n)^(nt) |
| 4. Given: P = $1000, r = 5% = 0.05, n = 12 (monthly), t = 10 years |
| 5. A = 1000(1 + 0.05/12)^(12*10) |
| 6. A = 1000(1.004167)^120 |
| 7. A ≈ $1,647.01 |
|
|
| The interest earned would be $647.01</think>Based on my calculations, if you invest **$1,000** at **5% annual interest** compounded monthly for **10 years**, you'll have approximately **$1,647.01**. |
|
|
| That's a total interest earned of **$647.01**! |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "Let me analyze this step by step:\n\n1. First, I need to understand the user's request\n2. They want to calculate compound interest\n3. The formula is A = P(1 + r/n)^(nt)\n4. Given: P = $1000, r = 5% = 0.05, n = 12 (monthly), t = 10 years\n5. A = 1000(1 + 0.05/12)^(12*10)\n6. A = 1000(1.004167)^120\n7. A ≈ $1,647.01\n\nThe interest earned would be $647.01", |
| "content": "Based on my calculations, if you invest **$1,000** at **5% annual interest** compounded monthly for **10 years**, you'll have approximately **$1,647.01**.\n\nThat's a total interest earned of **$647.01**!" |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I18. Full assistant turn: tool call only (no thinking, no content) |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <tool_call> |
| {"name": "get_current_time", "arguments": {"timezone": "UTC"}} |
| </tool_call> |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "content": "", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "get_current_time", |
| "arguments": { |
| "timezone": "UTC" |
| } |
| } |
| } |
| ] |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I19. Full assistant turn: complex JSON in tool arguments |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>Need to create a new user with nested profile data.</think>Creating the user account now. |
| <tool_call> |
| {"name": "create_user", "arguments": {"username": "john_doe", "profile": {"first_name": "John", "last_name": "Doe", "preferences": {"theme": "dark", "notifications": true}}, "roles": ["user", "editor"]}} |
| </tool_call> |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "Need to create a new user with nested profile data.", |
| "content": "Creating the user account now.", |
| "tool_calls": [ |
| { |
| "function": { |
| "name": "create_user", |
| "arguments": { |
| "username": "john_doe", |
| "profile": { |
| "first_name": "John", |
| "last_name": "Doe", |
| "preferences": { |
| "theme": "dark", |
| "notifications": true |
| } |
| }, |
| "roles": [ |
| "user", |
| "editor" |
| ] |
| } |
| } |
| } |
| ] |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I20. Full assistant turn: markdown formatting in content |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>User wants a summary with proper formatting. I'll use headers, lists, and code blocks.</think>## Summary of Results |
|
|
| Here are the key findings: |
|
|
| 1. **Performance improved by 45%** |
| 2. *Memory usage* decreased |
| 3. No breaking changes detected |
|
|
| ```python |
| # Sample code |
| def optimize(): |
| return "faster" |
| ``` |
|
|
| | Metric | Before | After | |
| |--------|--------|-------| |
| | Speed | 100ms | 55ms | |
| | Memory | 512MB | 384MB | |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "User wants a summary with proper formatting. I'll use headers, lists, and code blocks.", |
| "content": "## Summary of Results\n\nHere are the key findings:\n\n1. **Performance improved by 45%**\n2. *Memory usage* decreased\n3. No breaking changes detected\n\n```python\n# Sample code\ndef optimize():\n return \"faster\"\n```\n\n| Metric | Before | After |\n|--------|--------|-------|\n| Speed | 100ms | 55ms |\n| Memory | 512MB | 384MB |" |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I21. Full assistant turn: response after tool results (continuation) |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>I received the weather data. Tokyo is 22°C and partly cloudy, London is 15°C and rainy. Tokyo has better weather for visiting.</think>Based on the weather data: |
|
|
| **Tokyo**: 22°C, Partly cloudy ☀️ |
| **London**: 15°C, Rainy 🌧️ |
|
|
| I'd recommend visiting **Tokyo** if you want better weather! It's warmer and not raining. |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "I received the weather data. Tokyo is 22°C and partly cloudy, London is 15°C and rainy. Tokyo has better weather for visiting.", |
| "content": "Based on the weather data:\n\n**Tokyo**: 22°C, Partly cloudy ☀️\n**London**: 15°C, Rainy 🌧️\n\nI'd recommend visiting **Tokyo** if you want better weather! It's warmer and not raining." |
| } |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: I22. Full assistant turn: customer service response with empathy |
| -------------------------------------------------------------------------------- |
|
|
| RAW MODEL OUTPUT: |
| ``` |
| <think>Customer is upset about delayed order. I should acknowledge their frustration, apologize, and then provide a solution. The tracking shows weather delay at distribution center.</think>I completely understand your frustration, and I sincerely apologize for this delay. Two weeks is much longer than expected. |
|
|
| I've checked your order and found that it's been held at a distribution center due to severe weather conditions. Here's what I can do to help: |
|
|
| 1. ✅ Flag your shipment as priority |
| 2. ✅ Issue a 15% refund for the inconvenience |
| 3. ✅ Set up email notifications for tracking updates |
|
|
| Would you like me to proceed with all three options? |
| ``` |
|
|
| PARSED MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "Customer is upset about delayed order. I should acknowledge their frustration, apologize, and then provide a solution. The tracking shows weather delay at distribution center.", |
| "content": "I completely understand your frustration, and I sincerely apologize for this delay. Two weeks is much longer than expected.\n\nI've checked your order and found that it's been held at a distribution center due to severe weather conditions. Here's what I can do to help:\n\n1. ✅ Flag your shipment as priority\n2. ✅ Issue a 15% refund for the inconvenience\n3. ✅ Set up email notifications for tracking updates\n\nWould you like me to proceed with all three options?" |
| } |
|
|
|
|
|
|
|
|
| ================================================================================ |
| ROUNDTRIP TESTS - Template then Parse |
| ================================================================================ |
|
|
| These tests verify that structured messages survive a roundtrip: |
| structured -> apply_template -> parse_output -> compare |
|
|
| -------------------------------------------------------------------------------- |
| TEST: R1. Simple reasoning + content |
| -------------------------------------------------------------------------------- |
|
|
| ORIGINAL MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "This is my reasoning.", |
| "content": "This is my answer." |
| } |
|
|
| TEMPLATED OUTPUT (assistant part only): |
| ``` |
|
|
| <think>This is my reasoning.</think>This is my answer. |
| ``` |
|
|
| PARSED BACK: |
| { |
| "role": "assistant", |
| "reasoning_content": "This is my reasoning.", |
| "content": "This is my answer." |
| } |
|
|
| ✓ ROUNDTRIP SUCCESSFUL |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: R2. Multiline reasoning |
| -------------------------------------------------------------------------------- |
|
|
| ORIGINAL MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "Step 1: Think\nStep 2: More thinking\nStep 3: Conclude", |
| "content": "Final answer here." |
| } |
|
|
| TEMPLATED OUTPUT (assistant part only): |
| ``` |
|
|
| <think>Step 1: Think |
| Step 2: More thinking |
| Step 3: Conclude</think>Final answer here. |
| ``` |
|
|
| PARSED BACK: |
| { |
| "role": "assistant", |
| "reasoning_content": "Step 1: Think\nStep 2: More thinking\nStep 3: Conclude", |
| "content": "Final answer here." |
| } |
|
|
| ✓ ROUNDTRIP SUCCESSFUL |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: R3. Only content, no reasoning |
| -------------------------------------------------------------------------------- |
|
|
| ORIGINAL MESSAGE: |
| { |
| "role": "assistant", |
| "content": "Just a simple response." |
| } |
|
|
| TEMPLATED OUTPUT (assistant part only): |
| ``` |
|
|
| Just a simple response. |
| ``` |
|
|
| PARSED BACK: |
| { |
| "role": "assistant", |
| "content": "Just a simple response." |
| } |
|
|
| ✓ ROUNDTRIP SUCCESSFUL |
|
|
|
|
| -------------------------------------------------------------------------------- |
| TEST: R4. Content with special characters |
| -------------------------------------------------------------------------------- |
|
|
| ORIGINAL MESSAGE: |
| { |
| "role": "assistant", |
| "reasoning_content": "Need to handle <brackets> and \"quotes\"", |
| "content": "Here's code: `print('<hello>')`" |
| } |
|
|
| TEMPLATED OUTPUT (assistant part only): |
| ``` |
|
|
| <think>Need to handle <brackets> and "quotes"</think>Here's code: `print('<hello>')` |
| ``` |
|
|
| PARSED BACK: |
| { |
| "role": "assistant", |
| "reasoning_content": "Need to handle <brackets> and \"quotes\"", |
| "content": "Here's code: `print('<hello>')`" |
| } |
|
|
| ✓ ROUNDTRIP SUCCESSFUL |
|
|
|
|
|
|
| ================================================================================ |
| HUGGINGFACE TOKENIZER TEST |
| ================================================================================ |
|
|
| Test with HuggingFace tokenizer successful! |
|
|
| INPUT: |
| [ |
| { |
| "role": "system", |
| "content": "You are helpful." |
| }, |
| { |
| "role": "user", |
| "content": "What is 2+2?" |
| }, |
| { |
| "role": "assistant", |
| "reasoning_content": "Simple arithmetic: 2+2=4", |
| "content": "The answer is 4." |
| } |
| ] |
|
|
| OUTPUT: |
| ``` |
| <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
|
|
| You are helpful.<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
| What is 2+2?<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
| <think>Simple arithmetic: 2+2=4</think>The answer is 4.<|eot_id|> |
| ``` |