Do NOT use CUDA 13.2
#2
pinned
by danielhanchen - opened
Hey guys, please do not use CUDA 13.2 to run any quantized model or GGUF. Using CUDA 13.2 can lead to gibberish or otherwise incorrect outputs, and tool calling for all models including MiniMax-M2.7.
For now, you can:
- use our precompiled
llama.cppbinary, which uses CUDA 13, - use Unsloth Studio, which does not use CUDA 13.2, or
- use any CUDA version lower than 13.2.
NVIDIA is working on a fix.
danielhanchen pinned discussion