Reasoning off mode issue

#30
by GergelyZsolt - opened

I tried multiple qwen models with llama.cpp in --reasoning off mode, and it happens very often that an orphaned </think> tag appears. It did not do this with other chat templates.

here too:

config.ini:

[*]
n-gpu-layers = all
ctx-size = 65536
threads = 18
batch-size  = 2048
ubatch-size = 1024

parallel = 2
mlock = true
mmap = true
; no-mmap = true
flash-attn = true

cache-type-k = q8_0
cache-type-v = q8_0
cache-type-k-draft = q8_0
cache-type-v-draft = q8_0

reasoning = false
prio = 3
seed = 3407
jinja = true

[Qwen3.6-35B-A3B:UD-Q4_K_XL]
model = /models/Qwen3.6/Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf
mmproj = /models/Qwen3.6/mmproj-F16.gguf
temperature = 0.7
top-p = 0.8
top-k = 20
min-p = 0.0
presence-penalty = 1.5
repeat-penalty = 1.0
image-min-tokens = 1024
spec-type = draft-mtp
spec-draft-n-max = 2
chat-template-file = /templates/froggeric-chat_template-v19.jinja

open-webui v0.9.5

prompt:

**Build a VRAM and KV cache calculator tool for llama.cpp server.** The tool should include the following parameters: model type (e.g., Qwen2.5-72B), bit precision (4-bit/8-bit), total `--ctx-size`, number of `--parallel` slots, and batch size. The output should display the estimated VRAM usage, KV cache allocation per slot, and warnings if this configuration exceeds physical GPU limits to avoid 400 errors or stuttering/lag when running multiple concurrent threads.

response:

I'll build a comprehensive VRAM and KV Cache Calculator for llama.cpp server. Let me first research the current understanding of these calculations to ensure accuracy.

<function=web_search>
<parameter=search_queries>
["llama.cpp VRAM calculation KV cache formula 2024", "llama.cpp --ctx-size --parallel VRAM usage calculator", "KV cache memory calculation transformer models bits per token"]
</parameter>
</function>
</tool_call>
This comment has been hidden (marked as Off-Topic)

Any solutions?

I have this block

image

Found this one, works nice:
https://huggingface.co/spiritbuun/buun-Qwen3.6-chat_template

Thanks, I will give a try.

Sign up or log in to comment