Gemma_4_Think_SillyTavern_settings for text completion

by Nesaliti - opened Apr 16

Apr 16

Irrelevant for chat completion.
With the settings provided the thinking part is skipped in text completion mode. Reasoning works if:

{
"reasoning": {
"prefix": "<|channel>thought",
"suffix": "<channel|>",
"separator": "",
"name": "gemma4"
},
"srw": {
"value": "<|channel>thought Thinking Process:\n",
"show": true
}
}

In other words, with this added "start answer with" prefill.

Darkhn

Owner Apr 16

•

edited Apr 16

TLDR: most finetunes were done wrong, including my 14.0 of 31B, which is why i'm coming out with others soon

you must not prefill anything for reasoning on text completion, if you look at chat completion it also do not prefill, the model learned to output <|channel>thought itself, by looking if <|think|> is present in the system prompt.

i know my first version of the tune is WRONG, it was trained using a bad chat template, and must be trained on last turn only, not all assistant turns.

if you look very closely to google chat template, reasoning is not even supposed to be seen, it is stripped, only when tool calls are enabled will it show reasoning.

{%- if ns.prev_message_type != 'tool_response' and ns.prev_message_type != 'tool_call' -%} {{- '<|turn>model\n' -}} {%- if not enable_thinking | default(false) -%} {{- '<|channel>thought\n<channel|>' -}} {%- endif -%} {%- endif -%} {%- endif -%}
if the previous message is NOT a tool response and NOT a tool call

do <|turn>model\n

and if enable_thinking is not enabled
do <|channel>thought\n<channel|>
{%- if message['content'] is string -%} {%- if role == 'model' -%} {#- Check if this is the final turn to decide whether to strip -#} {%- if loop.index0 > ns_turn.last_user_idx -%} {{- message['content'] | trim -}} {%- else -%} {{- strip_thinking(message['content']) -}} {%- endif -%} {%- else -%} {{- message['content'] | trim -}} {%- endif -%} {%- elif message['content'] is sequence -%} {%- for item in message['content'] -%} {%- if item['type'] == 'text' -%} {%- if role == 'model' -%} {#- Apply the same logic for items in a sequence -#} {%- if loop.index0 > ns_turn.last_user_idx -%} {{- item['text'] | trim -}} {%- else -%} {{- strip_thinking(item['text']) -}} {%- endif -%} {%- else -%} {{- item['text'] | trim -}} {%- endif -%} {%- elif item['type'] == 'image' -%} {{- '<|image|>' -}} {%- set ns.prev_message_type = 'image' -%} {%- elif item['type'] == 'audio' -%} {{- '<|audio|>' -}} {%- set ns.prev_message_type = 'audio' -%} {%- elif item['type'] == 'video' -%} {{- '<|video|>' -}} {%- set ns.prev_message_type = 'video' -%} {%- endif -%} {%- endfor -%} {%- endif -%}

thats the fix i did
if the role is model

and we are the very last message after a user turn

just output the whole block and trim (remove extra spaces)

if its NOT the last message after a user turn

Strip the thinking

Nesaliti changed discussion status to closed Apr 16

Darkhn changed discussion status to open Apr 17

Darkhn

Owner Apr 17

wow... what the hell?, on Vllm it works fine both chat and text completion, on llama.cpp chat completion works, but on text completion it doesnt work without a prefill... VERY STRANGE!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment