model is not performing as good as GLM-4.5-Air-AWQ-FP16Mix

#1
by hareram241 - opened

model image understanding is quite decent(even coordinates question from image its accurate), but in coding tasks its not performing that good, and every time it outputs same result. maybe did anyone else also feel the same?

QuantTrio org

If we carefully check out that generation_config.json file, the default top_k is 1, which means no variation to the outputs.
We can surely lift it up, to like 20 or 50, and change default top_p to 0.9 or something.
But I guess this how GLM team tuned the model, changing those values could affect the performance, but worth a try though.

Thanks for the quick quant! I tried to adjust the generation params a bit and the repetition of failing tool calls became a lot better, but it did start making errors (mixing in Chinese and other weird token glitches).

I dont have the resources to create a quant of the model. I wanted to know if u could create one with 16bit activations. Also same with NVFP4. ? I am running on blackwells so this would perform even better and 16bit activations does not require any calibration data. -- I meant to post this on the full model not the V variant

Sign up or log in to comment