Merge branch 'main' of https://huggingface.co/spaces/HuggingFaceH4/general-on-policy-logit-distillation
Browse files
app/src/content/article.mdx
CHANGED
|
@@ -124,9 +124,9 @@ Figure 2: Diagram of sequence and vocabulary misalignments caused by differences
|
|
| 124 |
|
| 125 |
ULD lifts the tokenizer restriction but remains limited to offline setups. Next, we introduce our core contribution — **General On-Policy Logit Distillation (GOLD)** — which extends ULD into the on-policy setting with improved alignment techniques.
|
| 126 |
|
| 127 |
-
## General On-
|
| 128 |
|
| 129 |
-
While Universal Logit Distillation (ULD) allows training models with different tokenizers, its methods for sequence and vocabulary alignment have limitations. We developed General
|
| 130 |
|
| 131 |
### Sequence Alignment
|
| 132 |
|
|
@@ -385,7 +385,7 @@ accelerate launch \
|
|
| 385 |
|
| 386 |
```bash
|
| 387 |
accelerate launch \
|
| 388 |
-
--config_file examples/accelerate_configs/multi_gpu.yaml trl/
|
| 389 |
--model_name_or_path <sft-model> \
|
| 390 |
--dtype auto \
|
| 391 |
--attn_implementation kernels-community/flash-attn \
|
|
@@ -428,7 +428,7 @@ accelerate launch \
|
|
| 428 |
|
| 429 |
## Conclusion
|
| 430 |
|
| 431 |
-
In this post, we introduced General On-
|
| 432 |
|
| 433 |
GOLD builds upon the offline ULD method but extends it to the on-policy setting and, critically, addresses its two main weaknesses. First, we replace ULD's naive sequence truncation with a token-merging strategy that sums log probabilities of mismatched tokens. Second, we implement a hybrid vocabulary alignment method that uses a direct-mapping loss for shared tokens and falls back to ULD's sorting method only for unmatched tokens.
|
| 434 |
|
|
|
|
| 124 |
|
| 125 |
ULD lifts the tokenizer restriction but remains limited to offline setups. Next, we introduce our core contribution — **General On-Policy Logit Distillation (GOLD)** — which extends ULD into the on-policy setting with improved alignment techniques.
|
| 126 |
|
| 127 |
+
## General On-Policy Logit Distillation (GOLD)
|
| 128 |
|
| 129 |
+
While Universal Logit Distillation (ULD) allows training models with different tokenizers, its methods for sequence and vocabulary alignment have limitations. We developed General On-Policy Logit Distillation (GOLD), an algorithm that extends ULD by introducing improved vocabulary alignment techniques.
|
| 130 |
|
| 131 |
### Sequence Alignment
|
| 132 |
|
|
|
|
| 385 |
|
| 386 |
```bash
|
| 387 |
accelerate launch \
|
| 388 |
+
--config_file examples/accelerate_configs/multi_gpu.yaml trl/experimental/gold/gold.py \
|
| 389 |
--model_name_or_path <sft-model> \
|
| 390 |
--dtype auto \
|
| 391 |
--attn_implementation kernels-community/flash-attn \
|
|
|
|
| 428 |
|
| 429 |
## Conclusion
|
| 430 |
|
| 431 |
+
In this post, we introduced General On-Policy Logit Distillation (GOLD), a new method that enables effective on-policy knowledge distillation between models, even when the teacher and student do not share the same tokenizer vocabulary. This overcomes a significant limitation of existing on-policy methods like GKD, which require matched tokenizers.
|
| 432 |
|
| 433 |
GOLD builds upon the offline ULD method but extends it to the on-policy setting and, critically, addresses its two main weaknesses. First, we replace ULD's naive sequence truncation with a token-merging strategy that sums log probabilities of mismatched tokens. Second, we implement a hybrid vocabulary alignment method that uses a direct-mapping loss for shared tokens and falls back to ULD's sorting method only for unmatched tokens.
|
| 434 |
|