fix: `clear_thinking` inserts spurious `</think>` tag when reasoning_content is empty
Problem
In multi-turn conversations, the current chat_template.jinja inserts a bare </think> tag into historical assistant messages even when reasoning_content is empty. This causes issues with inference frameworks (e.g., SGLang) that rely on </think> token detection to control thinking budget.
Root Cause
The clear_thinking rendering block in the original template has an unconditional else branch that always inserts </think>:
{%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content -%}
{{ '<think>' + reasoning_content.strip() + '</think>'}}
{%- else -%}
{{ '</think>' }} β always inserts </think>, even when reasoning_content is empty
{%- endif -%}
When reasoning_content is empty (which is common β users typically don't provide reasoning_content in multi-turn history), the else branch fires and unconditionally inserts </think>. This renders historical assistant messages as:
<|assistant|>
</think>
This is a previous answer
Impact
Inference frameworks like SGLang use a ThinkingBudgetLogitProcessor that scans the full input token sequence for </think> to determine whether the thinking phase has already ended:
# SGLang ThinkingBudgetLogitProcessor
if self.THINKING_END_TOKEN_ID in cur_ids: # cur_ids = input + output
continue # skip budget enforcement
The spurious </think> tag from history tricks the processor into believing thinking has already concluded, causing it to skip budget enforcement entirely. As a result:
- 1st turn:
thinking_budgetworks correctly - 2nd turn onwards:
thinking_budgetsilently fails β model generates unbounded thinking content
Fix
Replace the clear_thinking / reasoning rendering block with cleaner logic:
{#- clear_thinking=true means clear thinking content, no thinking tags at all -#}
{%- if clear_thinking is not defined or not clear_thinking -%}
{#- clear_thinking=false or undefined: keep thinking content -#}
{%- if reasoning_content -%}
{{ '<think>' + reasoning_content.strip() + '</think>' }}
{%- endif -%}
{%- endif -%}
{#- clear_thinking=true: only output content, no thinking tags -#}
Behavior Matrix
clear_thinking |
reasoning_content |
Before (bug) | After (fix) |
|---|---|---|---|
false / undefined |
non-empty | <think>...</think> |
<think>...</think> |
false / undefined |
empty | </think> (spurious) |
(nothing) |
true |
non-empty | </think> (spurious) |
(nothing β cleared as intended) |
true |
empty | </think> (spurious) |
(nothing) |
Key change: when reasoning_content is empty, no thinking-related tags are emitted at all, regardless of clear_thinking.
Verification
| Scenario | Before fix | After fix |
|---|---|---|
Single-turn with thinking_budget |
Works | Works |
Multi-turn with thinking_budget |
Fails (budget ignored from 2nd turn) | Works |
clear_thinking=true with reasoning history |
Inserts spurious </think> |
Clean output, no tags |
clear_thinking=false with reasoning history |
Works | Works (no change) |
Scope
- This is a minimal, targeted fix β only the
clear_thinkingrendering block is changed - All other template logic (tools, generation prompt,
enable_thinking, etc.) remains untouched - The same issue also affects zai-org/GLM-4.7