Best train reward at step 4 (reward=0.453, pass@8=0.678). Uploading nearest HF export at step 5 (reward=0.398). Base model: Qwen/Qwen3-32B, dataset: exp_rpt_codeelo-v2. 6e7a327 verified atutej commited on Apr 7