Issue loading Qwen2.5-72B-Instruct-Q8_0 GGUF shards in Ollama - RTX PRO 6000 Blackwell

by Jerseryy - opened Jan 4

Jan 4

I am encountering a persistent "500 Internal Server Error: unable to load model" when trying to run this model in Ollama, specifically related to the blob storage.
My Environment:
GPU: NVIDIA RTX PRO 6000 Blackwell (96GB GDDR7 VRAM)
Driver: 591.59
Issue Detail: I have downloaded the two shards (00001-of-00002.gguf and 00002-of-00002.gguf). I’ve tried both:
Creating a Modelfile pointing directly to the first shard.
Merging the shards into a single file via copy /b.
In both cases, Ollama creates the blob successfully, but fails at the loading stage with: Error: 500 Internal Server Error: unable to load model: .../blobs/sha256-baa06c6fa24...
Request: Could you please verify if the Q8_0 quantization or the shard splitting has any known issues with the newer Blackwell architecture (sm_100/sm_101)? Or if there's a specific checksum I should verify against for these shards?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment