Normalize logged rewards to strict (0,1) for validator compatibility 688de63 Shinegupta commited on Apr 8
Align inference stdout to strict START/STEP/END key-value format f8bbb2a Shinegupta commited on Apr 7