on-policy-distillation

Running

cmpatino HF Staff commited on Oct 29, 2025

Commit

504cc7d

1 Parent(s): de727cf

Add links to TRL docs

Files changed (1) hide show

app/src/content/article.mdx CHANGED Viewed

@@ -76,7 +76,7 @@ Building on Universal Logit Distillation (ULD) [@boizard2025crosstokenizerdistil
 Our key contributions are:
-- Providing an open-source implementation of on-policy distillation methods and proving they work for multiple model combinations.
 - Extending ULD to the on-policy setting, where we sample completions from the student and align them to the teacher's distribution.
 - Implementing new sequence and vocabulary alignment methods that improve distillation performance when the student and the teacher have different tokenizers.

 Our key contributions are:
+- Providing an open-source implementation of on-policy distillation methods in TRL ([GKD](https://huggingface.co/docs/trl/en/gkd_trainer) and [GOLD](https://huggingface.co/docs/trl/main/en/gold_trainer)) and proving they work for multiple model combinations.
 - Extending ULD to the on-policy setting, where we sample completions from the student and align them to the teacher's distribution.
 - Implementing new sequence and vocabulary alignment methods that improve distillation performance when the student and the teacher have different tokenizers.