Question on expert trimming and performance
#1
by
fpjnijweide
- opened
Hi guys,
First of all, love your work. This is a great release.
I was wondering how you decided which experts to prune, and if you have assessed the intelligence of this slimmed-down version vs the original in any way.
Thanks!
Hi,
That interests me too.
Would it be possible to know the procedure and also if it is conceivable to prune it further ?
Does the fact that it's a multimodal model pose any problems ?
Is quantization planned for later (AWQ, GPTQ, WNA16, ...) ?
Thank you in advance.
Kimi-K2.5 is already on int4