Request GGUF: Kwaipilot/KAT-Dev (62.4% (!) on SWE-Bench Verified having just 32B)

#5
by Reverger - opened

Hi.
I appologize probably it's wrong place to put such requests.

But this model is so fantastic.
Can you please make your quants for it. (IQ4 and IQ3)

Hrm its a 32B dense model? I see u hoping for spec decode to speed it up already too: https://huggingface.co/Kwaipilot/KAT-Dev/discussions/11

I'll keep an eye on it, but honestly right now I'm working through some issues with 403 errors uploading with the recent public quota changes on hf. I'm now PRO subscriber and working with them to figure it out.

Assuming you have enough VRAM for full offload (especially on >=sm89 arch like a 4090 gpu), I'd suggest checking out: https://huggingface.co/ArtusDev/Kwaipilot_KAT-Dev-EXL3 as @ArtusDev makes high quality EXL3 quants to run on turboderp's https://github.com/turboderp-org/exllamav3 which works with tabbyapi as well for a nice experience.

ArtusDev's quants will be very competitive with the iq3_kt type trellis quants I occasionally make.

Sign up or log in to comment