fix: align step_reward with grade_episode, pin deps, update docs, clean inference 3f78483 padmapriyagosakan commited on Mar 31