violetxi/opd_tooluse_qwen3-4b_trained_teacher_forward_kl_teacher_step150_bs256_lr5e-6 4B • Updated 6 days ago • 20
violetxi/opd_tooluse_qwen3-4b_trained_teacher_forward_kl_teacher_step150_bs256_lr5e-6 4B • Updated 6 days ago • 20
ExpRL Collection Trained ExpRL checkpoints. Paper link: https://arxiv.org/abs/2606.17024 • 4 items • Updated 7 days ago
ExpRL Collection Trained ExpRL checkpoints. Paper link: https://arxiv.org/abs/2606.17024 • 4 items • Updated 7 days ago