https://huggingface.co/internlm/Intern-S2-Preview
Looks quite interesting. qwen3.5 35B moe model outperforming Gemini-3.1-Flash on several benchmarks.
https://huggingface.co/internlm/Intern-S2-Preview
They provide https://huggingface.co/internlm/Intern-S2-Preview-FP8 quants if that's useful.
This gguf has observations which might be relevant: https://huggingface.co/crogers2287/Intern-S2-Preview-FP8-GGUF (appears to not have the Q4_K_M weights yet).
Thanks!
It's queued!
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Intern-S2-Preview-GGUF for quants to appear.
ERROR:hf-to-gguf:Model InternS2PreviewForConditionalGeneration is not supported
Hey, the model doesnt seem to be supported by current llama cpp, at least not a week ago. Remind me in a few days, we will update llama cpp and try again, hopefully it will work =)
llama.cpp lacks support for the entire InternS2PreviewForConditionalGeneration architecture. We don't need to retry until the following search yields a result: https://github.com/search?q=repo%3Aggml-org%2Fllama.cpp+InternS2PreviewForConditionalGeneration&type=code
Thanks for the recommendation. I'm extremally interested in trying this model. It apparently supports vLLM so I will immediately run inference for it on Richard's supercomputer to try it out.
The conversion is described here:
and the patches are in https://huggingface.co/crogers2287/Intern-S2-Preview-FP8-GGUF/tree/main/patches
A higher level description is at the top of:
https://huggingface.co/crogers2287/Intern-S2-Preview-FP8-GGUF
That's the problem, we are not working with forks of llama cpp, so until they merge it into main, we will not be able yo process it