fix: implementing strict prompt conditions for scores/reward to be in 0.0–1.0 range 89b370c samrat-rm commited on 8 days ago
fix: clamp all rewards and scores to [0.10, 0.90] d3b224f samrat-rm Claude Sonnet 4.6 commited on 8 days ago