Why most custom model are stoping in the middle, like cutted too early?
I tried this model mradermacher/gemma4-E4B-it-disinhibited-GGUF to create a plan from my prompt, but every time a tool_calling occurred, it would stop in the middle. It was like the chat was cut off. my image as a example:
so as you can see where the model should use mkdir, it when cutted...
It also tested on llama.cpp in pi-coding-agent cli
no idea what is py coding agent, but try increasing max output tokens
So far i already on 65k token (if I put it the default model token like example goes to 131072 token, my vram is on dying aka will generate token very slow). But that problem occur even when less than 5k... so i assume it it wasnt the token were the cause, but tools_calling were I suspect it.
possibly it has an end token at tool calling, so you need to somehow handle it, like autocontinue after tool call or somehting
May you by any chance know how? if dont know how I understand it ngl, but if you know how let me know ๐
just ask some ai to help you with it, Im not sure what or how you are using it
ok, thanks for the respond btw ๐good luck there