Comparison of different Quants?

#3
by Manuun1 - opened

First of all. Unsloth Team you rock! Always amazing work! I deeply appreciate all the effort you put in for the community.
Do you have an overview of the different Quants and their quality of output?

Or in this precise case do you rather recommend the UD-Q3_K_XL or the UD-IQ4_XS for minimax m2.7?

Thanks a lot in advanced and great appreciation! 😃

Well Benjamin did some benchmarks for M2.5. Because MiniMax-M2.7 utilizes the same architecture as MiniMax-M2.5, GGUF quantization benchmarks for M2.7 should be very similar to M2.5. So, we'll refer to previous quant benchmark conducted for M2.5. See: https://unsloth.ai/docs/models/minimax-m27#gguf-benchmarks
image

The UD-IQ2-XXS looks suspiciously good. Is that the go to, or an error in testing?
Also would m2.7 work well with UD-IQ3-XXS or is a higher quant more preferable for the increased quality?

The UD-IQ2-XXS looks suspiciously good. Is that the go to, or an error in testing?
Also would m2.7 work well with UD-IQ3-XXS or is a higher quant more preferable for the increased quality?

Well we didn't do the tests but it shouldn't be an error in testing. Probably just a margin of error.
Higher quants are usually prefferred for quality of course

The UD-IQ2-XXS looks suspiciously good. Is that the go to, or an error in testing?

I would love to know how it compares to UD-Q2_K_XL 75.3 GB..

Well Benjamin did some benchmarks for M2.5. Because MiniMax-M2.7 utilizes the same architecture as MiniMax-M2.5, GGUF quantization benchmarks for M2.7 should be very similar to M2.5. So, we'll refer to previous quant benchmark conducted for M2.5. See: https://unsloth.ai/docs/models/minimax-m27#gguf-benchmarks
image

It's important to note that Benjamin's conclusion was that these models do not quantize well:

Minimax M2.5 GGUFs (from Q4 down to Q1) perform poorly overall. None of them come close to the original model.

That’s very different from my Qwen3.5 GGUF evaluations, where even TQ1_0 held up well enough.

Lessons:

- Models aren’t equally robust, even under otherwise very good quantization algorithms.

-“Just take Q4, it’ll be fine” is a rule of thumb that doesn’t generalize.

https://x.com/bnjmn_marie/status/2027043753484021810?s=20

So what would be considered the best quant for efficiency if Q4 and below are to brain-dead?

Unsloth AI org

Well Benjamin did some benchmarks for M2.5. Because MiniMax-M2.7 utilizes the same architecture as MiniMax-M2.5, GGUF quantization benchmarks for M2.7 should be very similar to M2.5. So, we'll refer to previous quant benchmark conducted for M2.5. See: https://unsloth.ai/docs/models/minimax-m27#gguf-benchmarks
image

It's important to note that Benjamin's conclusion was that these models do not quantize well:

Minimax M2.5 GGUFs (from Q4 down to Q1) perform poorly overall. None of them come close to the original model.

That’s very different from my Qwen3.5 GGUF evaluations, where even TQ1_0 held up well enough.

Lessons:

- Models aren’t equally robust, even under otherwise very good quantization algorithms.

-“Just take Q4, it’ll be fine” is a rule of thumb that doesn’t generalize.

https://x.com/bnjmn_marie/status/2027043753484021810?s=20

In comparison to Qwen3.5, yes the MiniMax models don't quantize well. But their quantizationa accuracy degradation is just similar to other models that are not Qwen3.5 Qwen3.5 is an outlier when it comes to quantization so you shouldn't compare to that.

So what would be considered the best quant for efficiency if Q4 and below are to brain-dead?

They're not braindead as you can see from the graph, they're just more sensitive to quantization than Qwen3.5.

Sign up or log in to comment