fix: clamp all rewards and scores to [0.10, 0.90] d3b224f samrat-rm Claude Sonnet 4.6 commited on 8 days ago
fix: clamp all score paths to (0.01, 0.99), fix reward field name, add per-task score line bf98c78 samrat-rm commited on 8 days ago
fix: enforce reward bounds (0.01–0.99) and 2 decimal precision across grader, env, and inference 3781ce7 samrat-rm commited on 8 days ago
feat: implement WhyDidItFailState for full OpenEnv state compliance ff8ce5f samrat-rm commited on 8 days ago
fix: normalize underfitting gradient norms and guard vague-answer penalty 909dfde samrat-rm commited on 8 days ago
fix: harden label rules to prevent missing_regularization misfires 3eeca00 samrat-rm commited on 8 days ago
feat(scenarios): add real gradient norms and improve scenario discriminability a22393e samrat-rm Claude Sonnet 4.6 commited on 9 days ago
feat: updating the logs with relevant model names for improving score function efficiency 88c0fc2 samrat-rm commited on 9 days ago
feat: adding steps count logic to encourage the agent explore more 17a43d0 samrat-rm commited on 9 days ago
feat: fix suggestion is required and not providing fix causes penalty c6888af samrat-rm commited on 9 days ago
feat(grade): inspected is upgraded to inspected_order. It rewards steps taken in order a818334 samrat-rm commited on 9 days ago
fix(grade): keyword matching and requires_fix flag for diagnosis scoring 9f554a9 samrat-rm commited on 11 days ago
refactor: WhyDidItFailAction and WhyDidItFailObservation classes 87037e2 samrat-rm commited on 11 days ago