But the hypocrisy meter just broke! π They are accusing Chinese labs like DeepSeek, Minimax, and Kimi of "huge distillation attacks. The Reality is that You can't just loot the entire internet's library, lock the door, and then sue everyone else for reading through the window. Stop trying to gatekeep the tech you didn't own in the first place. Read the complete article on it: https://huggingface.co/blog/Ujjwal-Tyagi/the-dark-underbelly-of-anthropic
Join the conversation
Join the community of Machine Learners and AI enthusiasts.
Sign UpBut the hypocrisy meter just broke! π They are accusing Chinese labs like DeepSeek, Minimax, and Kimi of "huge distillation attacks. The Reality is that You can't just loot the entire internet's library, lock the door, and then sue everyone else for reading through the window. Stop trying to gatekeep the tech you didn't own in the first place. Read the complete article on it: https://huggingface.co/blog/Ujjwal-Tyagi/the-dark-underbelly-of-anthropic
But it's true that Moonshot AI did heavy distillation of Claude models to build Kimi K2.5, as if you ask Kimi-K2.5 that "who are you" it says "I am Claude built by Anthropic", Anthropic is trying to protect their profit but they are quite right that about the safety of the community because there is no trust for Chinese AI Companies
They aren't releasing their weights, so other studios have to do it the slow way. This seems like a huge waste of computation, and a response to that in any way other than a utilitarian sense is just going to make the problem worse.
The reasonable solution would be to simply distribute curated distillations to prevent this sort of problem and save global power consumption.
Distillations with expert expectations are very difficult to finetune in a reasonable fashion. They often take more compute than the original took to even reach a similar state.
Distill, snap the experts off, boom you have yourself a distilled computation that can be utilized by companies on their own hardware, and then people will stop trying to reverse engineer and bulk extract information from your hardware. They'll be using their own internal hardware in a different and more cost effective fashion.
Make them good, reusable, expandable within reason, and this problem will evolve to distillation research. By that point the next generation of the big models will be out and the next series of distillations can be made, obsoleting the others.




