antirez/deepseek-v4-gguf · Suggestion: Add IQ3 variants.

Suggestion: Add IQ3 variants.

by tarruda - opened May 12

May 12

IQ3 might extract more of the original model's performance on 128G RAM devices. For example, I have a M1 ultra with 128G and it can run MiMo 2.5 (310B parameter) with unsloth's IQ3_XXS quant and 128k context:

This is fully GPU offloaded BTW.

m-i

May 13

I should have left this opened https://huggingface.co/antirez/deepseek-v4-gguf/discussions/3

Dredd

May 13

BatiAI's Q3_K_M is about 126gb and fit 128 ram with gpu offloading on Windows, but made for Macs as well.
(see readme for batai.cpp fork, but mainline's PR sounds about done)

https://huggingface.co/batiai/DeepSeek-V4-Flash-GGUF/tree/main

m-i

May 13

BatiAI's Q3_K_M is about 126gb and fit 128 ram with gpu offloading on Windows, but made for Macs as well.
(see readme for batai.cpp fork, but mainline's PR sounds about done)

https://huggingface.co/batiai/DeepSeek-V4-Flash-GGUF/tree/main

You will hardly have any context, unless you stream routed experts from ssd with https://github.com/Anemll/anemll-flash-llama.cpp/tree/DeepSeek-V4-SSD

tarruda

May 13

@Dredd can you share a link to the PR?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment