Massive clipping damage?

#1937
by rkh661 - opened

Hey Mradermacher and team, since I was just addressing Dan and Unsloth and his clipping of models. Do you do your intermediate conversion to F 16 for all quants? I see that you only offer F-16 as the highest quant, when almost you know 90% of models are BF16 native. I wonder whether you've been clipping and massively damaging all of your quants for a long time. I appreciate your help and work regardless of course, but could you address that? Thanks

hi, unlike most of the other quanters on huggingface, we have our custom fork that keeps the source precision, so if the model is bf16, the source quants stays bf16.
You can check here for the implementation: https://github.com/ggml-org/llama.cpp/compare/master...nicoboss:llama.cpp:mradermacher

God bless you, and it's great to hear. Thank you, Richard! I think this has been happening with other quanters for a while? It's amazing to me that something so obvious, could be so carelessly overlooked? Do you think he could offer a BF16 conversion for native models? I know you already do FP16? I have to start being really careful and doing my own BF16 native conversions to avoid clipping. And then if obviously I can't run it at BF16, I can use your quants. But again, I can't say it enough, you and Bartowski, just thank you sincerely for your contribution to the community Richard.

well, mradermacher already is doing fp16, not sure why others are not doing that, I assume because llama cpp doesnt have that by default? I dont know. You are always free to ask for any model that you need, even if you made one yourself, if it's compatible with llama cpp, then we can quant it for you =)
Thank you for your kind words, and have a great day =) dont be scared to ask anything, that might help everyone =)

Sign up or log in to comment