atutej's picture
Best train reward at step 15 (reward=0.217, pass@8=0.219). This is the exact best step. Base model: Qwen/Qwen3-32B, dataset: exp_rpt_stack-bash-withtests-gpt5mini.
751f73a verified