`preserve_thinking` flag ignored?

by astride-thee-squid - opened May 10

Discussion

astride-thee-squid

May 10

•

edited May 10

Thanks for these fixes!

I see the main page mention this:

The 3.6 template is a superset — it additionally handles preserve_thinking, </thinking> hallucination recovery, and interrupted thought streams. If you're on 3.6, use the 3.6 file.

But there's no reference to preserve_thinking in the template file. Is preserve_thinking ignored by the template file and kwargs override flag is not expected to make a difference there?

astride-thee-squid changed discussion status to closed May 10

astride-thee-squid changed discussion status to open May 11

froggeric changed discussion status to closed May 14

froggeric changed discussion status to open May 14

froggeric

Owner May 16

I think I have finally solved it in v19. So far it has been flawless in 3 long agentic tests in a row. Previously, I had it happen in around 80% of my sessions.

This has been a tough one to crack. To fix it I had to resort to better prompt engineering:

<IMPORTANT>
Reminder:
- You can use the <think></think> block to plan your next tool call OR to synthesize data and formulate your final response to the user.
- ALL explanation and reasoning MUST be placed strictly inside the <think></think> block.
- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags.
- If you choose to call a tool, you MUST output the <tool_call> block IMMEDIATELY after closing </think>. Do NOT output any conversational text before the tool call.
- The <tool_call> and <function> tags MUST be at the very beginning of a new line, with NO spaces or indentation before them.
- To call multiple functions, output a separate, completely closed <tool_call></tool_call> block for EACH function. Do NOT nest <tool_call> blocks.
- If you have gathered all necessary data and do not need to call a tool, answer the question like normal and provide your final response to the user IMMEDIATELY after closing </think>.
</IMPORTANT>

It helped a bit, but did not solve it. What I think finally did it, was a complete rewrite of the KV cache handling, by setting preserve_thinking to true as default, and abolishing the empty think injection, which was poisoning the model's in-context learning.

astride-thee-squid

May 16

Thanks! Looking forward to testing it out

froggeric

Owner Jun 5

Yes, the preserve_thinking flag is fully integrated and supported in the latest v19 and v20 templates (and defaults to true). Closing this thread.

froggeric changed discussion status to closed Jun 5

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment