Mohammed Hamdy's picture

🤝 Open to Collab

Mohammed Hamdy

mmhamdy

hugging-science

·

https://surfingmanifolds.substack.com/

AI & ML interests

AI4Sci | NLP | Reinforcement Learning

Recent Activity

reacted to theirpost with 🧠 about 16 hours ago

It has been more than a decade now since the knowledge distillation paper came out. Knowledge Distillation (KD) is one of my favorite topics, but I have to confess that I'm not a huge fan of the term because I find it confusing (or at least, it has became so over time). The idea behind KD is not novel; it was there almost a decade before the paper came out (and arguably even a decade before that, back to 1990-91). But this paper is the one that clicked, the one that made the topic much more popular and introduced it to a broader audience. First, the timing and the authors played a big role: we have Geoffrey Hinton, Oriol Vinyals, and Jeff Dean here. And second, Geoffrey Hinton is really good at idea branding: Model compression?! No, no, no! Let's call it "Knowledge Distillation" and use evocative terms such as "Dark Knowledge" to describe what is being transferred. It's a great name, but as time has passed, the term became a bit of a relic. KD is no longer solely about compression (KD used to be introduced as a method for model compression, but now model compression is just one application of KD). And the other thing is that the word "distillation" implies some sort of potency here, that the student is somehow more powerful than the teacher, which is not the case (but many counterarguments could be made, for example, more powerful compared to another model trained with no teacher) Nevertheless, the paper is incredibly well-written, short, and fun to read. It's one of few papers that I read several times. Check it out, and maybe share your thoughts on the topic with us here! If you had to choose another name for Knowledge Distillation, what would it be?

repliedto their post about 16 hours ago

It has been more than a decade now since the knowledge distillation paper came out. Knowledge Distillation (KD) is one of my favorite topics, but I have to confess that I'm not a huge fan of the term because I find it confusing (or at least, it has became so over time). The idea behind KD is not novel; it was there almost a decade before the paper came out (and arguably even a decade before that, back to 1990-91). But this paper is the one that clicked, the one that made the topic much more popular and introduced it to a broader audience. First, the timing and the authors played a big role: we have Geoffrey Hinton, Oriol Vinyals, and Jeff Dean here. And second, Geoffrey Hinton is really good at idea branding: Model compression?! No, no, no! Let's call it "Knowledge Distillation" and use evocative terms such as "Dark Knowledge" to describe what is being transferred. It's a great name, but as time has passed, the term became a bit of a relic. KD is no longer solely about compression (KD used to be introduced as a method for model compression, but now model compression is just one application of KD). And the other thing is that the word "distillation" implies some sort of potency here, that the student is somehow more powerful than the teacher, which is not the case (but many counterarguments could be made, for example, more powerful compared to another model trained with no teacher) Nevertheless, the paper is incredibly well-written, short, and fun to read. It's one of few papers that I read several times. Check it out, and maybe share your thoughts on the topic with us here! If you had to choose another name for Knowledge Distillation, what would it be?

posted an update about 16 hours ago

It has been more than a decade now since the knowledge distillation paper came out. Knowledge Distillation (KD) is one of my favorite topics, but I have to confess that I'm not a huge fan of the term because I find it confusing (or at least, it has became so over time). The idea behind KD is not novel; it was there almost a decade before the paper came out (and arguably even a decade before that, back to 1990-91). But this paper is the one that clicked, the one that made the topic much more popular and introduced it to a broader audience. First, the timing and the authors played a big role: we have Geoffrey Hinton, Oriol Vinyals, and Jeff Dean here. And second, Geoffrey Hinton is really good at idea branding: Model compression?! No, no, no! Let's call it "Knowledge Distillation" and use evocative terms such as "Dark Knowledge" to describe what is being transferred. It's a great name, but as time has passed, the term became a bit of a relic. KD is no longer solely about compression (KD used to be introduced as a method for model compression, but now model compression is just one application of KD). And the other thing is that the word "distillation" implies some sort of potency here, that the student is somehow more powerful than the teacher, which is not the case (but many counterarguments could be made, for example, more powerful compared to another model trained with no teacher) Nevertheless, the paper is incredibly well-written, short, and fun to read. It's one of few papers that I read several times. Check it out, and maybe share your thoughts on the topic with us here! If you had to choose another name for Knowledge Distillation, what would it be?

View all activity

Organizations

upvoted a collection 9 days ago

📝 Research & Long-Form Blog Posts

In-depth technical articles and research pieces published by Hugging Face • 18 items • Updated 28 days ago • 34

upvoted an article 9 days ago

Article

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

+3

ariG23498, sayakpaul, sergiopaniego, ror, pcuenq

•

28 days ago

• 127

upvoted a collection 20 days ago

JEPA In Bioscience

Applications of JEPA self-supervised learning in single-cell transcriptomics, genomics, and proteomics • 4 items • Updated 20 days ago • 1

upvoted an article 7 months ago

Article

Continuous batching from first principles

+1

ror, ArthurZ, mcpotato

•

Nov 25, 2025

• 411

upvoted 2 articles 8 months ago

Article

Promoter-GPT: Writing DNA Instructions with Language Models

hugging-science

•

Oct 22, 2025

• 25

Article

Scaling Test-Time Compute to Achieve Gold Medal at IOI 2025 with Open-Weight Models

nvidia

•

Oct 20, 2025

• 19

upvoted an article 12 months ago

Article

FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages

davanstrien

•

Jul 8, 2025

• 35

upvoted an article about 1 year ago

Article

Explore, Build, and Innovate AI Reasoning with NVIDIA’s Open Models and Recipes

nvidia

•

Jun 4, 2025

• 23

upvoted a paper about 1 year ago

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 161

upvoted an article about 1 year ago

Article

The 4 Things Qwen-3’s Chat Template Teaches Us

cfahlgren1

•

Apr 30, 2025

• 88

upvoted a paper about 1 year ago

Text Generation Beyond Discrete Token Sampling

Paper • 2505.14827 • Published May 20, 2025 • 10

upvoted an article about 1 year ago

Article

Tiny Agents: an MCP-powered agent in 50 lines of code

julien-c

•

Apr 25, 2025

• 308

upvoted a paper about 1 year ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 209

upvoted a paper over 1 year ago

Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7, 2025 • 124

upvoted an article over 1 year ago

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

+2

saurabhdash, olivernan, ArashAhmadian, johndang-cohere

•

Mar 4, 2025

• 78

upvoted a collection over 1 year ago

Cohere Labs Aya Vision

Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated Jul 31, 2025 • 74

upvoted an article over 1 year ago

Article

Common AI Model Formats

ngxson

•

Feb 27, 2025

• 73

upvoted a collection over 1 year ago

CHASE

Generate challenging synthetic data to evaluate LLMs • 4 items • Updated Mar 2 • 4

upvoted a paper over 1 year ago

How to Get Your LLM to Generate Challenging Problems for Evaluation

Paper • 2502.14678 • Published Feb 20, 2025 • 18

upvoted a collection over 1 year ago

Reasoning Datasets

50 items • Updated Jun 8, 2025 • 12