Tool Calling (jinja template) Issues
Just a heads up, LM Studio is not picking up the template correctly or something.
⎿ API Error: 400 {"error":{"message":"Error from provider(lmstudio,kat-dev@q4_k_m: 400): {\"error\":\"Error rendering prompt with jinja template: \\\"Unknown test:
sequence\\\".\\n\\nThis is usually an issue with the model's prompt template. If you are using a popular model, you can try to search the model under lmstudio-community, which will
have fixed prompt templates. If you cannot find one, you are welcome to post this issue to our discord or issue tracker on GitHub. Alternatively, if you know how to write jinja
templates, you can override the prompt template in My Models > model settings > Prompt Template.\"}Error: Error from provider(lmstudio,kat-dev@q4_k_m: 400): {\"error\":\"Error
rendering prompt with jinja template: \\\"Unknown test: sequence\\\".\\n\\nThis is usually an issue with the model's prompt template. If you are using a popular model, you can try
to search the model under lmstudio-community, which will have fixed prompt templates. If you cannot find one, you are welcome to post this issue to our discord or issue tracker on
GitHub. Alternatively, if you know how to write jinja templates, you can override the prompt template in My Models > model settings > Prompt Template.\"}\n at nt
(C:\\Users\\Marshall\\AppData\\Roaming\\npm\\node_modules\\@musistudio\\claude-code-router\\dist\\cli.js:77001:11)\n at l0
(C:\\Users\\Marshall\\AppData\\Roaming\\npm\\node_modules\\@musistudio\\claude-code-router\\dist\\cli.js:77059:11)\n at process.processTicksAndRejections
(node:internal/process/task_queues:105:5)\n at async a0
(C:\\Users\\Marshall\\AppData\\Roaming\\npm\\node_modules\\@musistudio\\claude-code-router\\dist\\cli.js:77026:96)","type":"api_error","code":"provider_response_error"}}
Looks like lack of support by lm studio for this model. Have you tried a current llama.cpp version?
Tried it with llama.cpp compiled from source - don't get the same error, but it does not parse OpenAI-type tool calls. Looking at the original template for the model it appears to be a vllm chat template for Qwen3-coder.
You'll probably have to wait until llm studio supports the template then (although it's not clear what you mean with "original" template - there is only the template, there aren't different versions for a specific model)
Hello, I see calls fail with cline, roocode, killocode.
Is this the system prompt problem or what?
You'll probably have to wait until llm studio supports the template then (although it's not clear what you mean with "original" template - there is only the template, there aren't different versions for a specific model)
Right, by original I mean that I did not override the chat template in the model in LM Studio or in LLAMA.cpp.
@akierum I'm pretty sure that Cline & Roo Code do not use OpenAI-type tool calls, but instead work by parsing the model output directly. Thus, it is likely a separate issue. Which quant are you running?
@akierum I think your issue is related to KV Cache quantization and/or flash attention. I see the same kinds of issues with Qwen3 Coder, Gemma3, and GPT-OSS when enabling flash attention. I suspect an issue in llama.cpp v1.52 but I've noticed similar issues since v1.50 when using flash attention.
No this happens if it is on or off
Hey folks, looks like the LM Studio community has resolved the template with this: https://huggingface.co/lmstudio-community/KAT-Dev-GGUF
Closing ticket.
Well I tried the new DevQuasar Kwaipilot.KAT-Dev.Q8_0.gguf
Now no more errors in Cline. Lets see if output is worth it, as previous version was way worse the Qwen3 coder 30b
Update:
Lmstudio still errors out:
2025-10-08 23:50:30 [INFO]
[LM STUDIO SERVER] Running chat completion on conversation with 30 messages.
2025-10-08 23:50:30 [INFO]
[LM STUDIO SERVER] Streaming response...
2025-10-08 23:51:06 [ERROR]
The model has crashed without additional information. (Exit code: 18446744072635812000). Error Data: n/a, Additional Data: n/a
2025-10-08 23:51:07 [INFO]
[JIT] Requested model (kwaipilot.kat-dev) is not loaded. Loading "DevQuasar/Kwaipilot.KAT-Dev-GGUF/Kwaipilot.KAT-Dev.Q8_0.gguf" now...
2025-10-08 23:52:38 [INFO]
[LM STUDIO SERVER] Running chat completion on conversation with 30 messages.
2025-10-08 23:52:38 [INFO]
[LM STUDIO SERVER] Streaming response...
2025-10-08 23:56:07 [ERROR]
The model has crashed without additional information. (Exit code: 18446744072635812000). Error Data: n/a, Additional Data: n/a
2025-10-08 23:56:09 [INFO]
[JIT] Requested model (kwaipilot.kat-dev) is not loaded. Loading "DevQuasar/Kwaipilot.KAT-Dev-GGUF/Kwaipilot.KAT-Dev.Q8_0.gguf" now...
Jan.ai also errors out:
Invalid API Response: The provider returned an empty or unparsable response. This is a provider-side issue where the model failed to generate valid output or returned tool calls that Cline cannot process. Retrying the request may help resolve this issue.
API Request Failed$0.0000
502 Proxy request to model failed: error sending request for url (http://127.0.0.1:3643/chat/completions): error trying to connect: tcp connect error: No connection could be made because the target machine actively refused it. (os error 10061)
[22:04:56]
ERROR
Proxy request to model failed: error sending request for url (http://127.0.0.1:3643/chat/completions): error trying to connect: tcp connect error: No connection could be made because the target machine actively refused it. (os error 10061)
[22:04:57]
DEBUG
Handling POST request to /chat/completions requiring model lookup in body
[22:04:57]
DEBUG
Extracted model_id: Kwaipilot.KAT-Dev.Q8_0
[22:04:57]
DEBUG
Found session for model_id Kwaipilot.KAT-Dev.Q8_0
[22:04:57]
DEBUG
Adding session Authorization header
[22:04:57]
DEBUG
Sending buffered body (221820 bytes)
[22:04:59]
ERROR
Proxy request to model failed: error sending request for url (http://127.0.0.1:3643/chat/completions): error trying to connect: tcp connect error: No connection could be made because the target machine actively refused it. (os error 10061)


