Curious how Q2_K 75 GB compares to unsloth UD-Q2_K_XL 75.3 GB?

#6
by ha1ry - opened

Any insight?

I haven't run it, but look at some of my other models showing the newer ik quants tend to outperform mainline quants.

I assume by Q2_K you are referring to my IQ2_KS 69.800 GiB (2.622 BPW) ?? (as the GB would be about what you say there yes).

Probably the IQ2_KS is better if they are the same size.

Do you compile your own ik_llama.cpp on Linux? Or do you need windows binaries? https://github.com/Thireus/ik_llama.cpp/releases

I'm going to try your IQ2_KS.

btw, Do you have opinion which version i should select, if i dl new version of IK, what would be ideal for my 1st gen scalable (gold) xeons?

Link you provided offers these options.

Linux CPU-only:

Ubuntu x64 (CPU) AVX2
Ubuntu x64 (CPU) AVX512
Ubuntu x64 (CPU) AVX512 VNNI
Ubuntu x64 (CPU) AVX512 VNNI BF16
Ubuntu x64 (CPU) AVX512 VNNI VBMI
Ubuntu x64 (CPU) AVX512 VNNI VBMI BF16

Thanks!

@ha1ry

oh jeeze, if you can swing Linux go for compiling yourself and it will pickup your exact CPU flags automatically...

otherwise you probably need to do lscpu | grep avx and see what you have (or google to find your specific CPU flags for your exact processor model) then pick the version that has the ones you actually have...

to be safe maybe try Ubuntu x64 (CPU) AVX512 which would likely work, but you might be leaving performance on the table if you have more flags

Sign up or log in to comment