Trained ExpRL checkpoints. Paper link: https://arxiv.org/abs/2606.17024
Violet Xiang PRO
violetxi
AI & ML interests
None yet
Recent Activity
upvoted a paper 3 days ago
GLM-5: from Vibe Coding to Agentic Engineering updated a model 3 days ago
violetxi/opd_tooluse_qwen3-4b_trained_teacher_forward_kl_bs256_lr5e-6 published a model 3 days ago
violetxi/opd_tooluse_qwen3-4b_trained_teacher_forward_kl_bs256_lr5e-6