cmpatino HF Staff commited on
Commit
5b79479
·
1 Parent(s): 433b179

Add accordions for code snippets

Browse files
Files changed (1) hide show
  1. app/src/content/article.mdx +5 -1
app/src/content/article.mdx CHANGED
@@ -55,6 +55,7 @@ pdfProOnly: false
55
 
56
  import HtmlEmbed from '../components/HtmlEmbed.astro'
57
  import Image from '../components/Image.astro'
 
58
 
59
  ## Introduction
60
  On-policy distillation is a highly effective strategy for compressing LLMs, as recently highlighted by [Thinking Machines' excellent blog post.](https://thinkingmachines.ai/blog/on-policy-distillation/) The technique trains a small "student" model by transferring knowledge from a high-performing "teacher" model's probability distribution. This allows the student to emulate the teacher's task performance, while significantly reducing size and latency.
@@ -386,8 +387,8 @@ Starting from the above checkpoint from SFT, we used the [`GKDTrainer`](https://
386
 
387
  If you want to try out knowledge distillation for yourself on your own use case, or a dataset from the hub, the recipe is available below.
388
 
389
- SNIPPETS
390
 
 
391
  ```bash
392
  accelerate launch \
393
  --config_file examples/accelerate_configs/multi_gpu.yaml trl/scripts/sft.py \
@@ -414,8 +415,10 @@ accelerate launch \
414
  --lr_scheduler_type cosine_with_min_lr \
415
  --use_liger_kernel
416
  ```
 
417
 
418
 
 
419
  ```bash
420
  accelerate launch \
421
  --config_file examples/accelerate_configs/multi_gpu.yaml trl/experimental/gold/gold.py \
@@ -458,6 +461,7 @@ accelerate launch \
458
  --warmup_ratio 0.05 \
459
  --lr_scheduler_type cosine_with_min_lr
460
  ```
 
461
 
462
  ## Conclusion
463
 
 
55
 
56
  import HtmlEmbed from '../components/HtmlEmbed.astro'
57
  import Image from '../components/Image.astro'
58
+ import Accordion from '../../../components/Accordion.astro'
59
 
60
  ## Introduction
61
  On-policy distillation is a highly effective strategy for compressing LLMs, as recently highlighted by [Thinking Machines' excellent blog post.](https://thinkingmachines.ai/blog/on-policy-distillation/) The technique trains a small "student" model by transferring knowledge from a high-performing "teacher" model's probability distribution. This allows the student to emulate the teacher's task performance, while significantly reducing size and latency.
 
387
 
388
  If you want to try out knowledge distillation for yourself on your own use case, or a dataset from the hub, the recipe is available below.
389
 
 
390
 
391
+ <Accordion title="SFT Recipe"
392
  ```bash
393
  accelerate launch \
394
  --config_file examples/accelerate_configs/multi_gpu.yaml trl/scripts/sft.py \
 
415
  --lr_scheduler_type cosine_with_min_lr \
416
  --use_liger_kernel
417
  ```
418
+ </Accordion>
419
 
420
 
421
+ <Accordion title="Distillation Recipe"
422
  ```bash
423
  accelerate launch \
424
  --config_file examples/accelerate_configs/multi_gpu.yaml trl/experimental/gold/gold.py \
 
461
  --warmup_ratio 0.05 \
462
  --lr_scheduler_type cosine_with_min_lr
463
  ```
464
+ </Accordion>
465
 
466
  ## Conclusion
467