fix: tighten reward/grade bounds to [0.01, 0.99] in models, orchestrator, and tests 9955cc3 williyam commited on Apr 28