WhyDidItFail / server

Commit History

chore: update doc string
6b279f6

samrat-rm commited on

fix: clamp all rewards and scores to [0.10, 0.90]
d3b224f

samrat-rm Claude Sonnet 4.6 commited on

fix: clamp all score paths to (0.01, 0.99), fix reward field name, add per-task score line
bf98c78

samrat-rm commited on

fix: enforce reward bounds (0.01–0.99) and 2 decimal precision across grader, env, and inference
3781ce7

samrat-rm commited on

fix: reward scores are updated to be between 0 and 1
c130122

samrat-rm commited on

chore: logs format update
e7b5e0d

samrat-rm commited on

feat: implement WhyDidItFailState for full OpenEnv state compliance
ff8ce5f

samrat-rm commited on

chore: clean up all the unnecessary comments
afa4b9d

samrat-rm commited on

feat: openEnv playground UI basic implementation
f7c4516

samrat-rm commited on

fix: normalize underfitting gradient norms and guard vague-answer penalty
909dfde

samrat-rm commited on

feat: add judge fallback
53f3a58

samrat-rm commited on

refactor: moving llm judge inside server dir
149177d

samrat-rm commited on

feat: add playground static file serving
aac6b30

samrat-rm commited on

fix: harden label rules to prevent missing_regularization misfires
3eeca00

samrat-rm commited on

feat(scenarios): add real gradient norms and improve scenario discriminability
a22393e

samrat-rm Claude Sonnet 4.6 commited on

feat: upgrading the inspect feedback function
d29cfdb

samrat-rm commited on

feat: updating the logs with relevant model names for improving score function efficiency
88c0fc2

samrat-rm commited on

feat: updating the evidence scoring function
a91fb6a

samrat-rm commited on

feat: adding steps count logic to encourage the agent explore more
17a43d0

samrat-rm commited on

feat: max step limit
1288c52

samrat-rm commited on

feat: fix suggestion is required and not providing fix causes penalty
c6888af

samrat-rm commited on

feat: ordering_bonus function implementation
236cf5b

samrat-rm commited on

feat(grade): inspected is upgraded to inspected_order. It rewards steps taken in order
a818334

samrat-rm commited on

refactor: Aligning the env with the new grade function
e216a2f

samrat-rm commited on

feat: adding more scenarios
a73576c

samrat-rm commited on

feat: grade function refactor and additional features
740ac53

samrat-rm commited on

fix: add state() method
3613ecf

samrat-rm commited on

feat: init graders and implement grade_easy() in env
243b472

samrat-rm commited on

chore: import statement refactor
04666da

samrat-rm commited on

fix: seed and episode_id in reset()
a0518e7

samrat-rm commited on

fix(grade): keyword matching and requires_fix flag for diagnosis scoring
9f554a9

samrat-rm commited on

feat: initial environment setup
572e42a

samrat-rm commited on

refactor: WhyDidItFailEnvironment class name
d08def9

samrat-rm commited on

refactor: WhyDidItFailAction and WhyDidItFailObservation classes
87037e2

samrat-rm commited on

feat: Initialised sceanrios
a80823e

samrat-rm commited on

Initial commit
b37875f

samrat-rm commited on