Instructions to use google/gemma-4-E4B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-E4B-it with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/gemma-4-E4B-it") model = AutoModelForImageTextToText.from_pretrained("google/gemma-4-E4B-it") - Notebooks
- Google Colab
- Kaggle
[Bug] `enable_thinking` doesn't trigger thinking mode for E4B/E2B
#26
by meeyeong11 - opened
Problem
enable_thinking=True works correctly on gemma-4-31B-it and gemma-4-26B-A4B-it , but has no effect on E4B and E2B.
Root cause: The 27B/31B models auto-generate <|channel>thought\n after <|turn>model\n, but E4B/E2B do not. The chat template needs to explicitly add this prefix when enable_thinking=True.
Reproduction
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-E4B-it")
messages = [{"role": "user", "content": "What is 2+2?"}]
result = tokenizer.apply_chat_template(
messages,
enable_thinking=True,
add_generation_prompt=True,
tokenize=False
)
print(result)
The prompt ends with <|turn>model\n — without <|channel>thought\n prefixed, the model jumps straight into the response, skipping the thinking block entirely.
Proposed Fix
In chat_template.jinja, add the thought channel prefix conditionally:
{%- if add_generation_prompt -%}
...
{{- '<|turn>model\n' -}}
{%- if enable_thinking is defined and enable_thinking -%}
{{- '<|channel>thought\n' -}}
{%- endif -%}
{%- endif -%}
Verification
Tested with vLLM
- Before fix: Model outputs direct answer without thinking block
- After fix: Model correctly generates thinking block starting with
<|channel>thought\n
Note
The same issue exists in E2B.
meeyeong11 changed discussion title from [Bug] `enable_thinking` doesn't trigger thinking mode for E4B/E2B - missing <|channel>thought prefix to [Bug] `enable_thinking` doesn't trigger thinking mode for E4B/E2B