TQ1_0 quants
So I decided to test out the TQ1_0 quants. The good news is - they're really good! The model is very coherent for this size. I was really positively surprised.
The bad news? They're not TQ1_0 quants. In fact, the entire quant does not contain a single ternary quant inside :) which is probably good, since Llama.cpp lacks CUDA support for them, but maybe name it like IQ1_XXS or sth? :)
Thanks for getting and glad to hear they're great.
Unfortunately if we name them another prefix which we always wanted to, the model doesn't pop up at all in our repo. We've wanted to rename it since forever but limitations like this doesn't allow us to. Also the TQ1 quants were specifically made to be a single model file that's not split and maybe from the next few models we might not upload Tq or 1-bit quants anymore
maybe from the next few models we might not upload Tq or 1-bit quants anymore
Please keep uploading them, they are super useful, e.g. minimax TQ1 is ATM the only really good model that still fits into 64GB RAM.
Yeah, I think the small quants are really useful - maybe get in contact with the Huggingface team to allow custom quant spects to appear in the downloads?