Kind of broken.
SOMETHING went wrong in the making of these quants, as Ollama's default quant outperforms all of them.
The issue is not on our quantization as we just use llama.cpp conversion without any code adjustmens, as you can see the looping or issues seems to occurs in all uploads of the model regardless of the uploader.
Also how did you test Ollamas quant? From our tests it performs very similar to our quants and LM studios
See:
Issue: https://huggingface.co/bartowski/zai-org_GLM-4.7-Flash-GGUF/discussions/1
Issue: https://huggingface.co/ngxson/GLM-4.7-Flash-GGUF/discussions/3
Issue: https://huggingface.co/noctrex/GLM-4.7-Flash-MXFP4_MOE-GGUF/discussions/1
We're trying to investigate and follow the llama.cpp thread to see what's going on: https://github.com/ggml-org/llama.cpp/pull/18936
IK. I am just putting this out so people know.
IK. I am just putting this out so people know.
wrote as well: https://huggingface.co/unsloth/GLM-4.7-Flash-GGUF/discussions/6
Ollama might be better, but still messes up code so easily.
IK. I am just putting this out so people know.
How did you test Ollamas quant? From our tests it performs very similar to our quants and LM studios
IK. I am just putting this out so people know.
How did you test Ollamas quant? From our tests it performs very similar to our quants and LM studios
ollama pull glm-4.7-flash:Q4_K_M
Write a snippet of python code that draws a cute kitty with Matplotlib