Llama.cpp support

by wsbagnsv1 - opened Sep 3, 2025

Sep 3, 2025

Would it be possible to add support for this model into llama.cpp? Since this is based on qwen3 and siglip it shouldnt be too hard, since those are already implemented in llama.cpp. The reason i ask, is because at7b this model would nicely fit into a 12gb vram card in even q8 quant but full precision is just too big for that /:

JinghaoWang

Jan 28

Thanks for opening the community — we’ve received it and have shared it with our algorithm team.

I’m not the best person to answer the details, so I don’t want to give you an inaccurate reply. The team is currently looking into it, and we’ll follow up with a more detailed response as soon as we have an update.

In the meantime, if you can share any extra info below, it will help us speed up the investigation:

model/version
your prompt + expected vs actual output
a minimal reproducible example (sample input; redacted is OK)
logs / request_id
As a thank-you for helping us improve Keye, we can offer:

a small Kwai merch gift (currently shipping within Mainland China only due to shipping/policy restrictions)
early access to upcoming Keye model updates/features
If you’d like either, just let us know (and for merch, you can share the shipping info later after we confirm).

For faster follow-up, you’re also welcome to join our communities:

Discord:https://discord.gg/4Q6AmzxpEK
WeChat ID:seeutomorrowo_O

Sorry again for the delay, and thanks for your patience — we’ll get back to you soon.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment