Update inference scoring: sum rewards, clamp to (0,1), add score to log_end bb56035 xaheli commited on Apr 8