Why most custom model are stoping in the middle, like cutted too early?

#2304
by Unpredicted - opened

I tried this model mradermacher/gemma4-E4B-it-disinhibited-GGUF to create a plan from my prompt, but every time a tool_calling occurred, it would stop in the middle. It was like the chat was cut off. my image as a example:
Screenshot 2026-05-01 215230
so as you can see where the model should use mkdir, it when cutted...
It also tested on llama.cpp in pi-coding-agent cli

no idea what is py coding agent, but try increasing max output tokens

So far i already on 65k token (if I put it the default model token like example goes to 131072 token, my vram is on dying aka will generate token very slow). But that problem occur even when less than 5k... so i assume it it wasnt the token were the cause, but tools_calling were I suspect it.

possibly it has an end token at tool calling, so you need to somehow handle it, like autocontinue after tool call or somehting

May you by any chance know how? if dont know how I understand it ngl, but if you know how let me know ๐Ÿ‘

just ask some ai to help you with it, Im not sure what or how you are using it

ok, thanks for the respond btw ๐Ÿ‘good luck there

Sign up or log in to comment