Add accordions for code snippets
Browse files
app/src/content/article.mdx
CHANGED
|
@@ -55,6 +55,7 @@ pdfProOnly: false
|
|
| 55 |
|
| 56 |
import HtmlEmbed from '../components/HtmlEmbed.astro'
|
| 57 |
import Image from '../components/Image.astro'
|
|
|
|
| 58 |
|
| 59 |
## Introduction
|
| 60 |
On-policy distillation is a highly effective strategy for compressing LLMs, as recently highlighted by [Thinking Machines' excellent blog post.](https://thinkingmachines.ai/blog/on-policy-distillation/) The technique trains a small "student" model by transferring knowledge from a high-performing "teacher" model's probability distribution. This allows the student to emulate the teacher's task performance, while significantly reducing size and latency.
|
|
@@ -386,8 +387,8 @@ Starting from the above checkpoint from SFT, we used the [`GKDTrainer`](https://
|
|
| 386 |
|
| 387 |
If you want to try out knowledge distillation for yourself on your own use case, or a dataset from the hub, the recipe is available below.
|
| 388 |
|
| 389 |
-
SNIPPETS
|
| 390 |
|
|
|
|
| 391 |
```bash
|
| 392 |
accelerate launch \
|
| 393 |
--config_file examples/accelerate_configs/multi_gpu.yaml trl/scripts/sft.py \
|
|
@@ -414,8 +415,10 @@ accelerate launch \
|
|
| 414 |
--lr_scheduler_type cosine_with_min_lr \
|
| 415 |
--use_liger_kernel
|
| 416 |
```
|
|
|
|
| 417 |
|
| 418 |
|
|
|
|
| 419 |
```bash
|
| 420 |
accelerate launch \
|
| 421 |
--config_file examples/accelerate_configs/multi_gpu.yaml trl/experimental/gold/gold.py \
|
|
@@ -458,6 +461,7 @@ accelerate launch \
|
|
| 458 |
--warmup_ratio 0.05 \
|
| 459 |
--lr_scheduler_type cosine_with_min_lr
|
| 460 |
```
|
|
|
|
| 461 |
|
| 462 |
## Conclusion
|
| 463 |
|
|
|
|
| 55 |
|
| 56 |
import HtmlEmbed from '../components/HtmlEmbed.astro'
|
| 57 |
import Image from '../components/Image.astro'
|
| 58 |
+
import Accordion from '../../../components/Accordion.astro'
|
| 59 |
|
| 60 |
## Introduction
|
| 61 |
On-policy distillation is a highly effective strategy for compressing LLMs, as recently highlighted by [Thinking Machines' excellent blog post.](https://thinkingmachines.ai/blog/on-policy-distillation/) The technique trains a small "student" model by transferring knowledge from a high-performing "teacher" model's probability distribution. This allows the student to emulate the teacher's task performance, while significantly reducing size and latency.
|
|
|
|
| 387 |
|
| 388 |
If you want to try out knowledge distillation for yourself on your own use case, or a dataset from the hub, the recipe is available below.
|
| 389 |
|
|
|
|
| 390 |
|
| 391 |
+
<Accordion title="SFT Recipe"
|
| 392 |
```bash
|
| 393 |
accelerate launch \
|
| 394 |
--config_file examples/accelerate_configs/multi_gpu.yaml trl/scripts/sft.py \
|
|
|
|
| 415 |
--lr_scheduler_type cosine_with_min_lr \
|
| 416 |
--use_liger_kernel
|
| 417 |
```
|
| 418 |
+
</Accordion>
|
| 419 |
|
| 420 |
|
| 421 |
+
<Accordion title="Distillation Recipe"
|
| 422 |
```bash
|
| 423 |
accelerate launch \
|
| 424 |
--config_file examples/accelerate_configs/multi_gpu.yaml trl/experimental/gold/gold.py \
|
|
|
|
| 461 |
--warmup_ratio 0.05 \
|
| 462 |
--lr_scheduler_type cosine_with_min_lr
|
| 463 |
```
|
| 464 |
+
</Accordion>
|
| 465 |
|
| 466 |
## Conclusion
|
| 467 |
|