fix: clamp all rewards and scores to [0.10, 0.90] d3b224f samrat-rm Claude Sonnet 4.6 commited on 8 days ago
fix: enforce reward bounds (0.01–0.99) and 2 decimal precision across grader, env, and inference 3781ce7 samrat-rm commited on 8 days ago
feat: implement WhyDidItFailState for full OpenEnv state compliance ff8ce5f samrat-rm commited on 8 days ago
fix: tighten label rules for underfitting, overfitting, and vanishing gradients 25fff92 samrat-rm commited on 8 days ago
feat: define WhyDidItFailAction and WhyDidItFailObservation models with typed fields and descriptions 5bf3c8c samrat-rm commited on 11 days ago