Does it make sense to have UD-IQ4_XS?

#4
by tarruda - opened

What led me to consider this was that sometimes Minimax M2.1 (also happens with M2) gets stuck in a thinking loop on IQ4_XS but I never saw it happen on UD-Q3_K_XL. OTOH I generally tend to get better coding results with IQ4_XS (which is the biggest quant that can fit on 128GB Macs for ~230B LLMs such as Minimax and Qwen3).

I wonder if this thinking loop could be fixed by creating an UD version of IQ4_XS, which I assume means was calibrated with unsloth dataset.

My starting point is also a Mac with 128GB RAM. I am using the UD-Q3_K_XL version and am satisfied with the results. No looping so far. For comparison, I will test the IQ4_XS version tomorrow and report back. Perhaps a 3-bit MLX version with DWQ could also help you.

My starting point is also a Mac with 128GB RAM. I am using the UD-Q3_K_XL version and am satisfied with the results. No looping so far. For comparison, I will test the IQ4_XS version tomorrow and report back.

I don't have anything other anecdotal evidence that IQ4_XS is better: In a few programming tests I did, IQ4_XS always seems to do better. But overall UD-Q3_K_XL seems good enough.

BTW I have seen similar looping with GLM 4.7 UD-IQ2_M. It is not always reproducible and generally when I click the retry button things work.

Perhaps a 3-bit MLX version with DWQ could also help you.

Any hints on how I can run an OpenAI compatible server with this? Also, does it support constrained output for tool calling?

Sign up or log in to comment