2 28 44

Juan CM PRO

jucamohedano

AI & ML interests

AI Systems MSc at Trento 🚀🤖

Recent Activity

updated a model about 1 month ago

jucamohedano/qwen2.5_vl_7b_oxford_pets_grpo_lora

updated a model about 1 month ago

jucamohedano/qwen2.5_vl_7b_oxford_pets_grpo_lora_more_rollouts

published a model about 1 month ago

jucamohedano/qwen2.5_vl_7b_oxford_pets_grpo_lora_more_rollouts

View all activity

Organizations

updated 2 models about 1 month ago

jucamohedano/qwen2.5_vl_7b_oxford_pets_grpo_lora

Updated May 25

jucamohedano/qwen2.5_vl_7b_oxford_pets_grpo_lora_more_rollouts

Updated May 25

published 2 models about 1 month ago

jucamohedano/qwen2.5_vl_7b_oxford_pets_grpo_lora_more_rollouts

Updated May 25

jucamohedano/qwen2.5_vl_7b_oxford_pets_grpo_lora

Updated May 25

liked a dataset about 1 month ago

ychenNLP/oven

Updated Jul 1, 2024 • 2.11k • 16

updated a Space 2 months ago

ml-intern sandbox

🌍

updated a model 2 months ago

ml-intern-explorers/ssd-qwen3vl-oxfordpets

Updated Apr 22 • 1

published 3 buckets 2 months ago

published a model 2 months ago

ml-intern-explorers/ssd-qwen3vl-oxfordpets

Updated Apr 22 • 1

published a bucket 2 months ago

jucamohedano/trackio-bucket-6

0 Bytes

published a model 2 months ago

ml-intern-explorers/grpo-qwen3vl-oxfordpets

Updated Apr 22 • 1

reacted to sergiopaniego's post with 🔥 2 months ago

Post

1438

Earlier this month, Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy

And… it's already supported in TRL, built by Kashif Rasul. you can really feel the pace of development in the team 🐎

Paper by Ruixiang ZHANG, He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang at Apple 🍎

How it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed

You can try it right away with this ready-to-run example (Qwen3-4B on rStar-Coder):
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd.py
or benchmark a checkpoint with the eval script:
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd_eval.py

One neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train × T_eval, so a broad band of configs works well. even very noisy samples still help

Want to dig deeper?

Paper: Embarrassingly Simple Self-Distillation Improves Code Generation (2604.01193)
Trainer docs: https://huggingface.co/docs/trl/main/en/ssd_trainer