IQ4_XS with iMatrix

#7
by Nexesenex - opened

Hey Ubergarm.

Could you quantize an IQ4_XS with imatrix and the embed and output tensors in q8_0?
That's what rolls best on a Core Ultra 265k.

P.S : Thanks for the IQ5_K, quality is top notch!

Heya! Oh interesting, I know that q8_0 is often the fastest for PP, but with trade-off for TG speeds due to memory bandwidth.

I'm getting other requests for something in that IQ4_KSS ~ IQ4_XS range, maybe I'll spend some time fishing there for better perplexity... Maybe an iq4_k with full q8_0 attn/shexp/dense/token_embd/output ? Or do you specifically need mainline compatible version?

@ubergarm : I couldn't resist to make my own tests. I cancel my request.
For info, I downloaded Bart's q8_0, split it in tensors, and made my recipes with the help of your recipes and thireus' work on individual tensors quants.
I settled (for now) for embeddings in q6_0, output in Q8_0, ffn down in Q6_0, and up/gate in Q5_0 (my cpu is quite slow with iqX_k quants, even more so with repacked ones).

Sign up or log in to comment