on-policy-distillation

Running

Experiments setup for reproduction

by charosen - opened Dec 24, 2025

Dec 24, 2025

Hi team! great blog, clear and easily understood. For getting started with on-policy distill, I would like to reproduce the experiments in your blog.

And I encounter some trouble.

For the dataset parts, there are 15.2k prompts for Qwen/Qwen2.5-7B-Instruct in blog, but the HuggingFaceTB/Countdown-Task-GOLD datasets shows it contains 30.4k rows, double from 15.2k. This makes me a little bit confusing.

cmpatino

Hugging Face H4 org Jan 21

•

edited Jan 21

Thank you for reading the blogpost! I took a look and indeed the numbers weren't consistent between the blogpost and the dataset. There was an error in the paragraph describing the dataset, so I modified the text in this PR to make it more clear.

cmpatino changed discussion status to closed Jan 21

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment