atutej's picture
Best train reward at step 4 (reward=0.453, pass@8=0.678). Uploading nearest HF export at step 5 (reward=0.398). Base model: Qwen/Qwen3-32B, dataset: exp_rpt_codeelo-v2.
6e7a327 verified