Redesign reward for discrimination: efficiency multiplier, strict penalties, stretch bonus, start at level 1 46f0850 Aswini-Kumar commited on Apr 26
Data-Centric AI RL Environment — OpenEnv Hackathon Submission 71dc210 Aswini-Kumar commited on Apr 25