Run 6: forced variants (eps 50%→70%), β_rank=0.25, R-level bonus, μ=2 PPO epochs, balanced R1-R5 warmup traces e198371 verified chane335 commited on Apr 25