on-policy-distillation

Running

cmpatino HF Staff commited on Oct 29, 2025

Commit

5b79479

1 Parent(s): 433b179

Add accordions for code snippets

Files changed (1) hide show

app/src/content/article.mdx CHANGED Viewed

@@ -55,6 +55,7 @@ pdfProOnly: false
 import HtmlEmbed from '../components/HtmlEmbed.astro'
 import Image from '../components/Image.astro'
 ## Introduction
 On-policy distillation is a highly effective strategy for compressing LLMs, as recently highlighted by [Thinking Machines' excellent blog post.](https://thinkingmachines.ai/blog/on-policy-distillation/) The technique trains a small "student" model by transferring knowledge from a high-performing "teacher" model's probability distribution. This allows the student to emulate the teacher's task performance, while significantly reducing size and latency.
@@ -386,8 +387,8 @@ Starting from the above checkpoint from SFT, we used the [`GKDTrainer`](https://
 If you want to try out knowledge distillation for yourself on your own use case, or a dataset from the hub, the recipe is available below.
-SNIPPETS
 ```bash
 accelerate launch \
   --config_file examples/accelerate_configs/multi_gpu.yaml trl/scripts/sft.py \
@@ -414,8 +415,10 @@ accelerate launch \
   --lr_scheduler_type cosine_with_min_lr \
   --use_liger_kernel
 ```
 ```bash
 accelerate launch \
   --config_file examples/accelerate_configs/multi_gpu.yaml trl/experimental/gold/gold.py \
@@ -458,6 +461,7 @@ accelerate launch \
   --warmup_ratio 0.05 \
   --lr_scheduler_type cosine_with_min_lr
 ```
 ## Conclusion

 import HtmlEmbed from '../components/HtmlEmbed.astro'
 import Image from '../components/Image.astro'
+import Accordion from '../../../components/Accordion.astro'
 ## Introduction
 On-policy distillation is a highly effective strategy for compressing LLMs, as recently highlighted by [Thinking Machines' excellent blog post.](https://thinkingmachines.ai/blog/on-policy-distillation/) The technique trains a small "student" model by transferring knowledge from a high-performing "teacher" model's probability distribution. This allows the student to emulate the teacher's task performance, while significantly reducing size and latency.
 If you want to try out knowledge distillation for yourself on your own use case, or a dataset from the hub, the recipe is available below.
+<Accordion title="SFT Recipe"
 ```bash
 accelerate launch \
   --config_file examples/accelerate_configs/multi_gpu.yaml trl/scripts/sft.py \
   --lr_scheduler_type cosine_with_min_lr \
   --use_liger_kernel
 ```
+</Accordion>
+<Accordion title="Distillation Recipe"
 ```bash
 accelerate launch \
   --config_file examples/accelerate_configs/multi_gpu.yaml trl/experimental/gold/gold.py \
   --warmup_ratio 0.05 \
   --lr_scheduler_type cosine_with_min_lr
 ```
+</Accordion>
 ## Conclusion