Do NOT use CUDA 13.2

#2
by danielhanchen - opened

Hey guys, please do not use CUDA 13.2 to run any quantized model or GGUF. Using CUDA 13.2 can lead to gibberish or otherwise incorrect outputs, and tool calling for all models including MiniMax-M2.7.

For now, you can:

  • use our precompiled llama.cpp binary, which uses CUDA 13,
  • use Unsloth Studio, which does not use CUDA 13.2, or
  • use any CUDA version lower than 13.2.

NVIDIA is working on a fix.

danielhanchen pinned discussion

Sign up or log in to comment