Mohammed Hamdy's picture

🤝 Open to Collab

Mohammed Hamdy

mmhamdy

hugging-science

·

https://surfingmanifolds.substack.com/

AI & ML interests

AI4Sci | NLP | Reinforcement Learning

Recent Activity

reacted to theirpost with 🧠 about 4 hours ago

It has been more than a decade now since the knowledge distillation paper came out. Knowledge Distillation (KD) is one of my favorite topics, but I have to confess that I'm not a huge fan of the term because I find it confusing (or at least, it has became so over time). The idea behind KD is not novel; it was there almost a decade before the paper came out (and arguably even a decade before that, back to 1990-91). But this paper is the one that clicked, the one that made the topic much more popular and introduced it to a broader audience. First, the timing and the authors played a big role: we have Geoffrey Hinton, Oriol Vinyals, and Jeff Dean here. And second, Geoffrey Hinton is really good at idea branding: Model compression?! No, no, no! Let's call it "Knowledge Distillation" and use evocative terms such as "Dark Knowledge" to describe what is being transferred. It's a great name, but as time has passed, the term became a bit of a relic. KD is no longer solely about compression (KD used to be introduced as a method for model compression, but now model compression is just one application of KD). And the other thing is that the word "distillation" implies some sort of potency here, that the student is somehow more powerful than the teacher, which is not the case (but many counterarguments could be made, for example, more powerful compared to another model trained with no teacher) Nevertheless, the paper is incredibly well-written, short, and fun to read. It's one of few papers that I read several times. Check it out, and maybe share your thoughts on the topic with us here! If you had to choose another name for Knowledge Distillation, what would it be?

repliedto their post about 4 hours ago

It has been more than a decade now since the knowledge distillation paper came out. Knowledge Distillation (KD) is one of my favorite topics, but I have to confess that I'm not a huge fan of the term because I find it confusing (or at least, it has became so over time). The idea behind KD is not novel; it was there almost a decade before the paper came out (and arguably even a decade before that, back to 1990-91). But this paper is the one that clicked, the one that made the topic much more popular and introduced it to a broader audience. First, the timing and the authors played a big role: we have Geoffrey Hinton, Oriol Vinyals, and Jeff Dean here. And second, Geoffrey Hinton is really good at idea branding: Model compression?! No, no, no! Let's call it "Knowledge Distillation" and use evocative terms such as "Dark Knowledge" to describe what is being transferred. It's a great name, but as time has passed, the term became a bit of a relic. KD is no longer solely about compression (KD used to be introduced as a method for model compression, but now model compression is just one application of KD). And the other thing is that the word "distillation" implies some sort of potency here, that the student is somehow more powerful than the teacher, which is not the case (but many counterarguments could be made, for example, more powerful compared to another model trained with no teacher) Nevertheless, the paper is incredibly well-written, short, and fun to read. It's one of few papers that I read several times. Check it out, and maybe share your thoughts on the topic with us here! If you had to choose another name for Knowledge Distillation, what would it be?

posted an update about 4 hours ago

It has been more than a decade now since the knowledge distillation paper came out. Knowledge Distillation (KD) is one of my favorite topics, but I have to confess that I'm not a huge fan of the term because I find it confusing (or at least, it has became so over time). The idea behind KD is not novel; it was there almost a decade before the paper came out (and arguably even a decade before that, back to 1990-91). But this paper is the one that clicked, the one that made the topic much more popular and introduced it to a broader audience. First, the timing and the authors played a big role: we have Geoffrey Hinton, Oriol Vinyals, and Jeff Dean here. And second, Geoffrey Hinton is really good at idea branding: Model compression?! No, no, no! Let's call it "Knowledge Distillation" and use evocative terms such as "Dark Knowledge" to describe what is being transferred. It's a great name, but as time has passed, the term became a bit of a relic. KD is no longer solely about compression (KD used to be introduced as a method for model compression, but now model compression is just one application of KD). And the other thing is that the word "distillation" implies some sort of potency here, that the student is somehow more powerful than the teacher, which is not the case (but many counterarguments could be made, for example, more powerful compared to another model trained with no teacher) Nevertheless, the paper is incredibly well-written, short, and fun to read. It's one of few papers that I read several times. Check it out, and maybe share your thoughts on the topic with us here! If you had to choose another name for Knowledge Distillation, what would it be?

View all activity

Organizations

liked a Space 8 months ago

Unlocking On-Policy Distillation for Any Model Family

Explore on-policy distillation visualization for any model

liked a dataset 8 months ago

transferable-samplers/many-peptides-md

Updated Dec 15, 2025 • 7.69k • 9

liked 3 Spaces 9 months ago

Science Release Heatmap

Explore AI4Science contributions by organizations and tags

Maintain the unmaintainable

Explore the complex relationships between 400+ machine learning models

Transformers Timeline

Interactive timeline to explore the 🤗Transformers models

liked a model 11 months ago

rednote-hilab/dots.ocr

Image-Text-to-Text • 3B • Updated Oct 31, 2025 • 288k • 1.32k

liked a dataset about 1 year ago

nvidia/Nemotron-Personas-USA

Viewer • Updated Dec 16, 2025 • 1M • 10.5k • 325

liked 2 models about 1 year ago

PlayHT/PlayDiffusion

Updated Jul 29, 2025 • 111

facebook/KernelLLM

Text Generation • 8B • Updated Jan 15 • 161 • • 202

liked a model over 1 year ago

sesame/csm-1b

Text-to-Speech • 2B • Updated Dec 1, 2025 • 313k • 2.4k

liked a Space over 1 year ago

The Distill Template

Craft Beautiful Blogs

liked 2 models over 1 year ago

ElectricAlexis/NotaGen

Updated Feb 26, 2025 • 155

microsoft/wham

Updated Dec 17, 2025 • 114 • 271

liked a Space over 1 year ago

The Ultra-Scale Playbook

The ultimate guide to training LLM on large GPU Clusters

liked a model over 1 year ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated Apr 10, 2025 • 16.1M • • 6.38k

liked a dataset over 1 year ago

HuggingFaceH4/MATH-500

Viewer • Updated Dec 15, 2025 • 500 • 125k • 317

liked a model over 1 year ago

answerdotai/ModernBERT-base

Fill-Mask • 0.1B • Updated Jan 15, 2025 • 9.45M • 1.06k

liked a Space over 1 year ago

Scaling test-time compute

Boost LLM answers with flexible test‑time search strategies

liked a model over 1 year ago

CohereLabs/c4ai-command-r7b-12-2024

Text Generation • 8B • Updated Oct 30, 2025 • 27.2k • • 425

liked a dataset over 1 year ago

CohereLabs/Global-MMLU

Viewer • Updated Aug 14, 2025 • 602k • 20.7k • 160