Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
danielhanchenΒ 
posted an update 2 days ago
Post
3009
You can now run Kimi K2.5 locally! πŸ”₯

We shrank the 1T model to 240GB (-60%) via Dynamic 1-bit.
Get >40 tok/s on 242GB or 622GB VRAM/RAM for near full precision.

GGUF: unsloth/Kimi-K2.5-GGUF

Guide: https://unsloth.ai/docs/models/kimi-k2.5

locally my ass

Β·

Our guide shows how llama.cpp allows disk, RAM and VRAM offloading, so it optimally allocates it. For eg Mac unified memory systems are well suited.

No Daniel, I cannot run Kimi K2.5 locally. Do i look like i'm rich? 😭

Β·

Sorry! We try our best to make it smaller for folks to run

I'll give this a try on our office GPU rig. VRAM limited to 96gb but DRAM is 1024gb attached to a 64 core threadripper.

I doubt I'll be able to pull 40 TPS on this rig, but hey, local is local!

Thx to all the Unsloth guys as usual!