sergiopaniego HF Staff commited on
Commit
cc26bbe
·
verified ·
1 Parent(s): a16c36b

General On-Policy Logit Distillation

Browse files
Files changed (1) hide show
  1. app/src/content/article.mdx +3 -3
app/src/content/article.mdx CHANGED
@@ -129,9 +129,9 @@ Figure 2: Diagram of sequence and vocabulary misalignments caused by differences
129
 
130
  ULD lifts the tokenizer restriction but remains limited to offline setups. Next, we introduce our core contribution — **General On-Policy Logit Distillation (GOLD)** — which extends ULD into the on-policy setting with improved alignment techniques.
131
 
132
- ## General On-policy Logit Distillation (GOLD)
133
 
134
- While Universal Logit Distillation (ULD) allows training models with different tokenizers, its methods for sequence and vocabulary alignment have limitations. We developed General on-policy Logit Distillation (GOLD), an algorithm that extends ULD by introducing improved vocabulary alignment techniques.
135
 
136
  ### Sequence Alignment
137
 
@@ -433,7 +433,7 @@ accelerate launch \
433
 
434
  ## Conclusion
435
 
436
- In this post, we introduced General On-policy Logit Distillation (GOLD), a new method that enables effective on-policy knowledge distillation between models, even when the teacher and student do not share the same tokenizer vocabulary. This overcomes a significant limitation of existing on-policy methods like GKD, which require matched tokenizers.
437
 
438
  GOLD builds upon the offline ULD method but extends it to the on-policy setting and, critically, addresses its two main weaknesses. First, we replace ULD's naive sequence truncation with a token-merging strategy that sums log probabilities of mismatched tokens. Second, we implement a hybrid vocabulary alignment method that uses a direct-mapping loss for shared tokens and falls back to ULD's sorting method only for unmatched tokens.
439
 
 
129
 
130
  ULD lifts the tokenizer restriction but remains limited to offline setups. Next, we introduce our core contribution — **General On-Policy Logit Distillation (GOLD)** — which extends ULD into the on-policy setting with improved alignment techniques.
131
 
132
+ ## General On-Policy Logit Distillation (GOLD)
133
 
134
+ While Universal Logit Distillation (ULD) allows training models with different tokenizers, its methods for sequence and vocabulary alignment have limitations. We developed General On-Policy Logit Distillation (GOLD), an algorithm that extends ULD by introducing improved vocabulary alignment techniques.
135
 
136
  ### Sequence Alignment
137
 
 
433
 
434
  ## Conclusion
435
 
436
+ In this post, we introduced General On-Policy Logit Distillation (GOLD), a new method that enables effective on-policy knowledge distillation between models, even when the teacher and student do not share the same tokenizer vocabulary. This overcomes a significant limitation of existing on-policy methods like GKD, which require matched tokenizers.
437
 
438
  GOLD builds upon the offline ULD method but extends it to the on-policy setting and, critically, addresses its two main weaknesses. First, we replace ULD's naive sequence truncation with a token-merging strategy that sums log probabilities of mismatched tokens. Second, we implement a hybrid vocabulary alignment method that uses a direct-mapping loss for shared tokens and falls back to ULD's sorting method only for unmatched tokens.
439