Low quality quant
I tried this quant with oMLX backend server, and it seems pretty broken, confusing simple numbers during tasks requiring calculations and going into infinite loops. Tried with the recommended and other sampling params.
It could also be that M2.7 is more sensitive to quantization?
@bibproj thanks,
I remember reading that MiniMax M2.x is more sensitive to aggressive quantization compared to Qwen3 or Qwen3.5 models.
I tried in the past I think MiniMax M2.1 3bit mlx with LM Studio and it was ok.
Sorry I already deleted this 3bit quant, I’m curious to try some of the mixed/dynamic mlx versions that seem to be popular now. I have Mac Studio with 128gb memory, so looking for something to fit plus room for decent context.
Ubergarm is quite good with this. He normally does this using ik_llama.cpp, with good results. It is not MLX, but normally also does work on Macs. You can find his quants for MiniMax-2.7 at https://huggingface.co/ubergarm/MiniMax-M2.7-GGUF. Try the smol-IQ3_KS version at https://huggingface.co/ubergarm/MiniMax-M2.7-GGUF/tree/main/smol-IQ3_KS, which is 93.7 GB. That sounds about right for your 128GB Mac Studio.