edbeeching commited on
Commit
8e331e8
·
2 Parent(s): 145ff8f cc26bbe

Merge branch 'main' of https://huggingface.co/spaces/HuggingFaceH4/general-on-policy-logit-distillation

Browse files
Files changed (1) hide show
  1. app/src/content/article.mdx +4 -4
app/src/content/article.mdx CHANGED
@@ -124,9 +124,9 @@ Figure 2: Diagram of sequence and vocabulary misalignments caused by differences
124
 
125
  ULD lifts the tokenizer restriction but remains limited to offline setups. Next, we introduce our core contribution — **General On-Policy Logit Distillation (GOLD)** — which extends ULD into the on-policy setting with improved alignment techniques.
126
 
127
- ## General On-policy Logit Distillation (GOLD)
128
 
129
- While Universal Logit Distillation (ULD) allows training models with different tokenizers, its methods for sequence and vocabulary alignment have limitations. We developed General on-policy Logit Distillation (GOLD), an algorithm that extends ULD by introducing improved vocabulary alignment techniques.
130
 
131
  ### Sequence Alignment
132
 
@@ -385,7 +385,7 @@ accelerate launch \
385
 
386
  ```bash
387
  accelerate launch \
388
- --config_file examples/accelerate_configs/multi_gpu.yaml trl/scripts/gkd.py \
389
  --model_name_or_path <sft-model> \
390
  --dtype auto \
391
  --attn_implementation kernels-community/flash-attn \
@@ -428,7 +428,7 @@ accelerate launch \
428
 
429
  ## Conclusion
430
 
431
- In this post, we introduced General On-policy Logit Distillation (GOLD), a new method that enables effective on-policy knowledge distillation between models, even when the teacher and student do not share the same tokenizer vocabulary. This overcomes a significant limitation of existing on-policy methods like GKD, which require matched tokenizers.
432
 
433
  GOLD builds upon the offline ULD method but extends it to the on-policy setting and, critically, addresses its two main weaknesses. First, we replace ULD's naive sequence truncation with a token-merging strategy that sums log probabilities of mismatched tokens. Second, we implement a hybrid vocabulary alignment method that uses a direct-mapping loss for shared tokens and falls back to ULD's sorting method only for unmatched tokens.
434
 
 
124
 
125
  ULD lifts the tokenizer restriction but remains limited to offline setups. Next, we introduce our core contribution — **General On-Policy Logit Distillation (GOLD)** — which extends ULD into the on-policy setting with improved alignment techniques.
126
 
127
+ ## General On-Policy Logit Distillation (GOLD)
128
 
129
+ While Universal Logit Distillation (ULD) allows training models with different tokenizers, its methods for sequence and vocabulary alignment have limitations. We developed General On-Policy Logit Distillation (GOLD), an algorithm that extends ULD by introducing improved vocabulary alignment techniques.
130
 
131
  ### Sequence Alignment
132
 
 
385
 
386
  ```bash
387
  accelerate launch \
388
+ --config_file examples/accelerate_configs/multi_gpu.yaml trl/experimental/gold/gold.py \
389
  --model_name_or_path <sft-model> \
390
  --dtype auto \
391
  --attn_implementation kernels-community/flash-attn \
 
428
 
429
  ## Conclusion
430
 
431
+ In this post, we introduced General On-Policy Logit Distillation (GOLD), a new method that enables effective on-policy knowledge distillation between models, even when the teacher and student do not share the same tokenizer vocabulary. This overcomes a significant limitation of existing on-policy methods like GKD, which require matched tokenizers.
432
 
433
  GOLD builds upon the offline ULD method but extends it to the on-policy setting and, critically, addresses its two main weaknesses. First, we replace ULD's naive sequence truncation with a token-merging strategy that sums log probabilities of mismatched tokens. Second, we implement a hybrid vocabulary alignment method that uses a direct-mapping loss for shared tokens and falls back to ULD's sorting method only for unmatched tokens.
434