https://huggingface.co/internlm/Intern-S2-Preview

#2395
by SkyMind - opened

Looks quite interesting. qwen3.5 35B moe model outperforming Gemini-3.1-Flash on several benchmarks.

https://huggingface.co/internlm/Intern-S2-Preview

They provide https://huggingface.co/internlm/Intern-S2-Preview-FP8 quants if that's useful.

This gguf has observations which might be relevant: https://huggingface.co/crogers2287/Intern-S2-Preview-FP8-GGUF (appears to not have the Q4_K_M weights yet).

Thanks!

It's queued!
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Intern-S2-Preview-GGUF for quants to appear.

ERROR:hf-to-gguf:Model InternS2PreviewForConditionalGeneration is not supported

Hey, the model doesnt seem to be supported by current llama cpp, at least not a week ago. Remind me in a few days, we will update llama cpp and try again, hopefully it will work =)

llama.cpp lacks support for the entire InternS2PreviewForConditionalGeneration architecture. We don't need to retry until the following search yields a result: https://github.com/search?q=repo%3Aggml-org%2Fllama.cpp+InternS2PreviewForConditionalGeneration&type=code

Thanks for the recommendation. I'm extremally interested in trying this model. It apparently supports vLLM so I will immediately run inference for it on Richard's supercomputer to try it out.

That's the problem, we are not working with forks of llama cpp, so until they merge it into main, we will not be able yo process it

Sign up or log in to comment