@danielhanchen on Hugging Face: "You can now run Kimi K2.5 locally! 🔥 We shrank the 1T model to 240GB (-60%)…"

Join the community of Machine Learners and AI enthusiasts.

posted an update Jan 28

Post

3528

You can now run Kimi K2.5 locally! 🔥

We shrank the 1T model to 240GB (-60%) via Dynamic 1-bit.
Get >40 tok/s on 242GB or 622GB VRAM/RAM for near full precision.

GGUF: unsloth/Kimi-K2.5-GGUF

Guide: https://unsloth.ai/docs/models/kimi-k2.5

Jan 30

locally my ass

Jan 30

Our guide shows how llama.cpp allows disk, RAM and VRAM offloading, so it optimally allocates it. For eg Mac unified memory systems are well suited.

Jan 30

•

No Daniel, I cannot run Kimi K2.5 locally. Do i look like i'm rich? 😭

Jan 30

Sorry! We try our best to make it smaller for folks to run

Jan 30

I'll give this a try on our office GPU rig. VRAM limited to 96gb but DRAM is 1024gb attached to a 64 core threadripper.

I doubt I'll be able to pull 40 TPS on this rig, but hey, local is local!

Thx to all the Unsloth guys as usual!

In this post