[Bug] `enable_thinking` doesn't trigger thinking mode for E4B/E2B

#26
by meeyeong11 - opened

Problem

enable_thinking=True works correctly on gemma-4-31B-it and gemma-4-26B-A4B-it , but has no effect on E4B and E2B.

Root cause: The 27B/31B models auto-generate <|channel>thought\n after <|turn>model\n, but E4B/E2B do not. The chat template needs to explicitly add this prefix when enable_thinking=True.

Reproduction

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")
messages = [{"role": "user", "content": "What is 2+2?"}]

result = tokenizer.apply_chat_template(
    messages,
    enable_thinking=True,
    add_generation_prompt=True,
    tokenize=False
)
print(result)

The prompt ends with <|turn>model\n — without <|channel>thought\n prefixed, the model jumps straight into the response, skipping the thinking block entirely.

Proposed Fix

In chat_template.jinja, add the thought channel prefix conditionally:

{%- if add_generation_prompt -%}
    ...
    {{- '<|turn>model\n' -}}
    {%- if enable_thinking is defined and enable_thinking -%}
        {{- '<|channel>thought\n' -}}
    {%- endif -%}
{%- endif -%}

Verification

Tested with vLLM

  • Before fix: Model outputs direct answer without thinking block
  • After fix: Model correctly generates thinking block starting with <|channel>thought\n

Note

The same issue exists in E2B.

meeyeong11 changed discussion title from [Bug] `enable_thinking` doesn't trigger thinking mode for E4B/E2B - missing <|channel>thought prefix to [Bug] `enable_thinking` doesn't trigger thinking mode for E4B/E2B

Sign up or log in to comment