5-bit MiniMax M3 running locally on a single M3 Ultra 512GB via Unsloth!

#3
by danielhanchen - opened
Unsloth AI org

Hey guys, you can now run and train MiniMax M3 in Unsloth Studio. GitHub
Recommended inference settings are automatically set. Guide

Example of MiniMax M3 (5-bit GGUF) running in Unsloth Studio.

mini max m3 in unsloth studio
danielhanchen pinned discussion
danielhanchen changed discussion title from MiniMax M3 running locally on a single M3 Ultra 512GB via Unsloth! to 5-bit MiniMax M3 running locally on a single M3 Ultra 512GB via Unsloth!

I got this working on my 512GB M3 Ultra too - I was wondering though, do these GGUFs have the MTP heads / vision encoder / etc still intact (just not utilized) or will additional GGUF updates be needed in the future to re-enable those features once llama.cpp/etc pick them up? Thanks for all your work on this!

According to VLLM docs, they made it work with EAGLE3 speculative decoding:

https://vllm.ai/blog/2026-06-12-minimax-m3-vllm

Speculative decoding: EAGLE3 support with the draft model released at Inferact/MiniMax-M3-EAGLE3.

Not sure if the same can be done with llama.cpp or Unsloth Studio?

what tokens per second do you get? pp and tg please

According to VLLM docs, they made it work with EAGLE3 speculative decoding:

https://vllm.ai/blog/2026-06-12-minimax-m3-vllm

Speculative decoding: EAGLE3 support with the draft model released at Inferact/MiniMax-M3-EAGLE3.

Not sure if the same can be done with llama.cpp or Unsloth Studio?

Eagle support got merged to mainline llama.cpp but it might require another PR for Minimax support.

Sign up or log in to comment