is Imatrix better than the regular Quants?
i'm curious about it, I see two GGUF models one had the "i" for the Imatrix, but confused on which I should use.
Weighted/imatrix quants offer higher quality than static quants at the same model size and resource usage. If unsure always use weighted/imatrix quants. I recommend you consult the quality column at ouer download page linked in all ouer model cards. You can even select diffrent metric like KL divergence, Perplexity, Same token probablity and eval results to check what quant best fits your needs.
Weighted/imatrix quants offer higher quality than static quants at the same model size and resource usage. If unsure always use weighted/imatrix quants. I recommend you consult the quality column at ouer download page linked in all ouer model cards. You can even select diffrent metric like KL divergence, Perplexity, Same token probablity and eval results to check what quant best fits your needs.
Thank you so much for the info.
i'm curious about it, I see two GGUF models one had the "i" for the Imatrix, but confused on which I should use.
Also see these benchmarks https://huggingface.co/mradermacher/BabyHercules-4x150M-GGUF/discussions/2#674a7958ce9bc37b8e33cf55
I was also curious about this and I think Imatrix might not be that good... when doing quantization you need a dataset, you use this dataset to "guide" the model and tell it what to give more attention when making quants which will "preserve" this specific data or written style when compressing.
So let's say a model is made specifically for roleplaying and story writing, if the person doing the quantization only has a dataset from wikipedia, news or technical books for example, the model won't perform as expected compared to the classic static quantization.
Unless the person has a dataset completely made from story book.. the model behavior will be robotic and more logical rather than creative, will lose emotional tone and lacks a good sequence of dialogues, which is expected from a Roleplay/Storywriter model.
I was also curious about this and I think Imatrix might not be that good... when doing quantization you need a dataset, you use this dataset to "guide" the model and tell it what to give more attention when making quants which will "preserve" this specific data or written style when compressing.
So let's say a model is made specifically for roleplaying and story writing, if the person doing the quantization only has a dataset from wikipedia, news or technical books for example, the model won't perform as expected compared to the classic static quantization.
Unless the person has a dataset completely made from story book.. the model behavior will be robotic and more logical rather than creative, will lose emotional tone and lacks a good sequence of dialogues, which is expected from a Roleplay/Storywriter model.
This is interesting,
Is this how it works?
@Noire1 No not at all. The imatrix dataset is only used to measure what wights in the models are important and so should be quantized in higher precision. The imatrix dataset does not in any way change the knowledge, behavior or writing style of the model. It does the exact opposite by trying to find a way to quantize the model to any desired size while keeping it as close to the original as possible. As you can see in the KL-divergence, same token probability and top token probability measurements weighted/imatrix quants are far closer to the original unquantised model compared to static quants. This also applies to use cases and even languages not present in the imatrix dataset. Even training an important matrix with random tokens will result in weighted/imatrix quants superior to same sized static quants (someone even wrote a paper about it).
Regarding the question what data should be included inside an imatrix dataset please read the discussion I had about this exact topic last week: https://huggingface.co/mradermacher/model_requests/discussions/1470