Any Chance of GGUF Quantizations?

#5
by facehuggerfromspace - opened

I am really looking forward to give this model a try on my Strix Halo. Is there any chance of us getting Q4 quantizations of this model? This could be a great replacement for GPT-OSS 120B.

Hi,
I agree.
A GGUF version would help to raise awareness of this model which, given the results presented, deserves to be better known and have greater visibility.
Thank you in advance.

Thank you for your suggestion.
We will work on adding support for GGUF and llama.cpp. According to our roadmap, we will prioritize Int8/Int4 quantization combined with vLLM capabilities.
Please stay tuned.

Sign up or log in to comment