The model keeps repeating responses when running for a while, need to update to the latest llamacpp

#1
by xldistance - opened

I downloaded the ubergarm/Step-3.5-Flash-GGUF model and it runs normally, but your GGUF quantization and also the AesSedai/Step-3.5-Flash-GGUF I tried both have issues - later on they just keep repeating or producing meaningless output

The quants work beautifully.

You MUST update to the latest version of llama.cpp (min build: version: 7970, b7970-eb449cdfa). Make sure to use the --jinja option only: llama-cli -m Step-3.5-Flash-PRISM-LITE-IQ2_M.gguf --jinja

You may also turn "thinking" off with this model: llama-server -m Step-3.5-Flash-PRISM-LITE-IQ2_M.gguf --jinja --chat-template-kwargs '{"enable_thinking": false}'

image
One Shot Perfection!
image

Sign up or log in to comment