Gemma_4_Think_SillyTavern_settings for text completion
Irrelevant for chat completion.
With the settings provided the thinking part is skipped in text completion mode. Reasoning works if:
{
"reasoning": {
"prefix": "<|channel>thought",
"suffix": "<channel|>",
"separator": "",
"name": "gemma4"
},
"srw": {
"value": "<|channel>thought Thinking Process:\n",
"show": true
}
}
In other words, with this added "start answer with" prefill.
TLDR: most finetunes were done wrong, including my 14.0 of 31B, which is why i'm coming out with others soon
you must not prefill anything for reasoning on text completion, if you look at chat completion it also do not prefill, the model learned to output <|channel>thought itself, by looking if <|think|> is present in the system prompt.
i know my first version of the tune is WRONG, it was trained using a bad chat template, and must be trained on last turn only, not all assistant turns.
if you look very closely to google chat template, reasoning is not even supposed to be seen, it is stripped, only when tool calls are enabled will it show reasoning.
{%- if ns.prev_message_type != 'tool_response' and ns.prev_message_type != 'tool_call' -%} {{- '<|turn>model\n' -}} {%- if not enable_thinking | default(false) -%} {{- '<|channel>thought\n<channel|>' -}} {%- endif -%} {%- endif -%} {%- endif -%}
if the previous message is NOT a tool response and NOT a tool call
do <|turn>model\n
and if enable_thinking is not enabled
do <|channel>thought\n<channel|>{%- if message['content'] is string -%} {%- if role == 'model' -%} {#- Check if this is the final turn to decide whether to strip -#} {%- if loop.index0 > ns_turn.last_user_idx -%} {{- message['content'] | trim -}} {%- else -%} {{- strip_thinking(message['content']) -}} {%- endif -%} {%- else -%} {{- message['content'] | trim -}} {%- endif -%} {%- elif message['content'] is sequence -%} {%- for item in message['content'] -%} {%- if item['type'] == 'text' -%} {%- if role == 'model' -%} {#- Apply the same logic for items in a sequence -#} {%- if loop.index0 > ns_turn.last_user_idx -%} {{- item['text'] | trim -}} {%- else -%} {{- strip_thinking(item['text']) -}} {%- endif -%} {%- else -%} {{- item['text'] | trim -}} {%- endif -%} {%- elif item['type'] == 'image' -%} {{- '<|image|>' -}} {%- set ns.prev_message_type = 'image' -%} {%- elif item['type'] == 'audio' -%} {{- '<|audio|>' -}} {%- set ns.prev_message_type = 'audio' -%} {%- elif item['type'] == 'video' -%} {{- '<|video|>' -}} {%- set ns.prev_message_type = 'video' -%} {%- endif -%} {%- endfor -%} {%- endif -%}
thats the fix i did
if the role is model
and we are the very last message after a user turn
just output the whole block and trim (remove extra spaces)
if its NOT the last message after a user turn
Strip the thinking
wow... what the hell?, on Vllm it works fine both chat and text completion, on llama.cpp chat completion works, but on text completion it doesnt work without a prefill... VERY STRANGE!


