fix: `clear_thinking` inserts spurious `</think>` tag when reasoning_content is empty

#46
by beckyu - opened

Problem

In multi-turn conversations, the current chat_template.jinja inserts a bare </think> tag into historical assistant messages even when reasoning_content is empty. This causes issues with inference frameworks (e.g., SGLang) that rely on </think> token detection to control thinking budget.

Root Cause

The clear_thinking rendering block in the original template has an unconditional else branch that always inserts </think>:

{%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content -%}
{{ '<think>' + reasoning_content.strip() +  '</think>'}}
{%- else -%}
{{ '</think>' }}   ← always inserts </think>, even when reasoning_content is empty
{%- endif -%}

When reasoning_content is empty (which is common β€” users typically don't provide reasoning_content in multi-turn history), the else branch fires and unconditionally inserts </think>. This renders historical assistant messages as:

<|assistant|>
</think>
This is a previous answer

Impact

Inference frameworks like SGLang use a ThinkingBudgetLogitProcessor that scans the full input token sequence for </think> to determine whether the thinking phase has already ended:

# SGLang ThinkingBudgetLogitProcessor
if self.THINKING_END_TOKEN_ID in cur_ids:  # cur_ids = input + output
    continue  # skip budget enforcement

The spurious </think> tag from history tricks the processor into believing thinking has already concluded, causing it to skip budget enforcement entirely. As a result:

  • 1st turn: thinking_budget works correctly
  • 2nd turn onwards: thinking_budget silently fails β€” model generates unbounded thinking content

Fix

Replace the clear_thinking / reasoning rendering block with cleaner logic:

{#- clear_thinking=true means clear thinking content, no thinking tags at all -#}
{%- if clear_thinking is not defined or not clear_thinking -%}
    {#- clear_thinking=false or undefined: keep thinking content -#}
    {%- if reasoning_content -%}
{{ '<think>' + reasoning_content.strip() + '</think>' }}
    {%- endif -%}
{%- endif -%}
{#- clear_thinking=true: only output content, no thinking tags -#}

Behavior Matrix

clear_thinking reasoning_content Before (bug) After (fix)
false / undefined non-empty <think>...</think> <think>...</think>
false / undefined empty </think> (spurious) (nothing)
true non-empty </think> (spurious) (nothing β€” cleared as intended)
true empty </think> (spurious) (nothing)

Key change: when reasoning_content is empty, no thinking-related tags are emitted at all, regardless of clear_thinking.

Verification

Scenario Before fix After fix
Single-turn with thinking_budget Works Works
Multi-turn with thinking_budget Fails (budget ignored from 2nd turn) Works
clear_thinking=true with reasoning history Inserts spurious </think> Clean output, no tags
clear_thinking=false with reasoning history Works Works (no change)

Scope

  • This is a minimal, targeted fix β€” only the clear_thinking rendering block is changed
  • All other template logic (tools, generation prompt, enable_thinking, etc.) remains untouched
  • The same issue also affects zai-org/GLM-4.7
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment