IQ4_KSS?

#6
by jpbwin - opened

Hello!!!

I know this **** takes time, but I see iq4kss on the graph but not on the quant list. Do you plan on uploading it? It would be perfect for my 2x 3090 64gb ram machine.

Thanks again for all your hard work!

Thanks! I was not planning on releasing it because the nearby IQ4_XS has lower "better" perplexity. I know it may not be completely accurate, but I'm limiting the number of quants I upload to avoid hitting my public storage quota.

There might be a better mix in that 4bpw range but I spent most of the time searching in the lower sizes.

Can you fit the IQ4_XS or is it just a little too big for your rig?

Oh I'm just catching up on messages, I see some more requests now too hah

I have 64gb ram and 48gb vram so while I can run it, it's either too small context to be that useful, or too slow pp / tg if I throttle back the offloaded layers for context. This model is on several knives' edges for people's hardware... good size for the capability

I'm digging around a little for something in that ~4bpw range. I tried a IQ3_K 90.518 GiB (3.948 BPW) which came in at 2.6156 +/- 0.01244 - not great not terrible.

I'll try a smol-IQ4_KSS which may end up size as the IQ3_K but will see how it benchmarks on perplexity.

My home rig is 96GB DDR5-6400MT/s and a 3090TI FE 24GB VRAM so same ballpark as your setup.

Okay, uploading the smol-IQ4_KSS 94.080 GiB (4.103 BPW) which is about the sweetest spot I can find just below that IQ4_XS. I'm uploading it now and will post the new perplexity graphs (including all the failed versions, amusingly MXFP4 was not good, which lines up with what I would expect).

Lemme know how it works out for you!

In my own experience using opencode it can sometimes get a little stuck when trying to find if a file exists, i just watch it and touch the necessary file and it seems to get back on track. Folks have noticed some looping on the official model too so I don't think its the quants: https://github.com/ggml-org/llama.cpp/pull/19283#issuecomment-3867824935

πŸ›πŸ›πŸ› arigato gozaimasu and shit

Check the jinja - it might be slightly different than embedded one. It seems they changed it

Sign up or log in to comment