`preserve_thinking` flag ignored?

#9
by astride-thee-squid - opened

Thanks for these fixes!

I see the main page mention this:

The 3.6 template is a superset — it additionally handles preserve_thinking, </thinking> hallucination recovery, and interrupted thought streams. If you're on 3.6, use the 3.6 file.

But there's no reference to preserve_thinking in the template file. Is preserve_thinking ignored by the template file and kwargs override flag is not expected to make a difference there?

astride-thee-squid changed discussion status to closed
astride-thee-squid changed discussion status to open
froggeric changed discussion status to closed
froggeric changed discussion status to open

I think I have finally solved it in v19. So far it has been flawless in 3 long agentic tests in a row. Previously, I had it happen in around 80% of my sessions.

This has been a tough one to crack. To fix it I had to resort to better prompt engineering:

<IMPORTANT>
Reminder:
- You can use the <think></think> block to plan your next tool call OR to synthesize data and formulate your final response to the user.
- ALL explanation and reasoning MUST be placed strictly inside the <think></think> block.
- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags.
- If you choose to call a tool, you MUST output the <tool_call> block IMMEDIATELY after closing </think>. Do NOT output any conversational text before the tool call.
- The <tool_call> and <function> tags MUST be at the very beginning of a new line, with NO spaces or indentation before them.
- To call multiple functions, output a separate, completely closed <tool_call></tool_call> block for EACH function. Do NOT nest <tool_call> blocks.
- If you have gathered all necessary data and do not need to call a tool, answer the question like normal and provide your final response to the user IMMEDIATELY after closing </think>.
</IMPORTANT>

It helped a bit, but did not solve it. What I think finally did it, was a complete rewrite of the KV cache handling, by setting preserve_thinking to true as default, and abolishing the empty think injection, which was poisoning the model's in-context learning.

Thanks! Looking forward to testing it out

Sign up or log in to comment