Spaces:

samrat-rm
/

WhyDidItFail

Sleeping

App Files Files Community

WhyDidItFail / server

Commit History

chore: update doc string

6b279f6

samrat-rm commited on 8 days ago

fix: clamp all rewards and scores to [0.10, 0.90]

d3b224f

samrat-rm Claude Sonnet 4.6 commited on 8 days ago

fix: clamp all score paths to (0.01, 0.99), fix reward field name, add per-task score line

bf98c78

samrat-rm commited on 8 days ago

fix: enforce reward bounds (0.01–0.99) and 2 decimal precision across grader, env, and inference

3781ce7

samrat-rm commited on 8 days ago

fix: reward scores are updated to be between 0 and 1

c130122

samrat-rm commited on 8 days ago

chore: logs format update

e7b5e0d

samrat-rm commited on 8 days ago

feat: implement WhyDidItFailState for full OpenEnv state compliance

ff8ce5f

samrat-rm commited on 8 days ago

chore: clean up all the unnecessary comments

afa4b9d

samrat-rm commited on 8 days ago

feat: openEnv playground UI basic implementation

f7c4516

samrat-rm commited on 8 days ago

fix: normalize underfitting gradient norms and guard vague-answer penalty

909dfde

samrat-rm commited on 8 days ago

feat: add judge fallback

53f3a58

samrat-rm commited on 8 days ago

refactor: moving llm judge inside server dir

149177d

samrat-rm commited on 8 days ago

feat: add playground static file serving

aac6b30

samrat-rm commited on 8 days ago

fix: harden label rules to prevent missing_regularization misfires

3eeca00

samrat-rm commited on 8 days ago

feat(scenarios): add real gradient norms and improve scenario discriminability

a22393e

samrat-rm Claude Sonnet 4.6 commited on 9 days ago

feat: upgrading the inspect feedback function

d29cfdb

samrat-rm commited on 9 days ago

feat: updating the logs with relevant model names for improving score function efficiency

88c0fc2

samrat-rm commited on 9 days ago

feat: updating the evidence scoring function

a91fb6a

samrat-rm commited on 9 days ago

feat: adding steps count logic to encourage the agent explore more

17a43d0

samrat-rm commited on 9 days ago

feat: max step limit

1288c52

samrat-rm commited on 9 days ago

feat: fix suggestion is required and not providing fix causes penalty

c6888af

samrat-rm commited on 9 days ago

feat: ordering_bonus function implementation

236cf5b

samrat-rm commited on 9 days ago

feat(grade): inspected is upgraded to inspected_order. It rewards steps taken in order

a818334

samrat-rm commited on 9 days ago

refactor: Aligning the env with the new grade function

e216a2f

samrat-rm commited on 9 days ago

feat: adding more scenarios

a73576c

samrat-rm commited on 9 days ago

feat: grade function refactor and additional features

740ac53

samrat-rm commited on 9 days ago

fix: add state() method

3613ecf

samrat-rm commited on 10 days ago

feat: init graders and implement grade_easy() in env

243b472

samrat-rm commited on 10 days ago

chore: import statement refactor

04666da

samrat-rm commited on 10 days ago

fix: seed and episode_id in reset()

a0518e7

samrat-rm commited on 10 days ago

fix(grade): keyword matching and requires_fix flag for diagnosis scoring

9f554a9

samrat-rm commited on 11 days ago

feat: initial environment setup

572e42a

samrat-rm commited on 11 days ago

refactor: WhyDidItFailEnvironment class name

d08def9

samrat-rm commited on 11 days ago

refactor: WhyDidItFailAction and WhyDidItFailObservation classes

87037e2

samrat-rm commited on 11 days ago

feat: Initialised sceanrios

a80823e

samrat-rm commited on 11 days ago

Initial commit

b37875f

samrat-rm commited on 11 days ago

Commit History

chore: update doc string 6b279f6

fix: clamp all rewards and scores to [0.10, 0.90] d3b224f

fix: clamp all score paths to (0.01, 0.99), fix reward field name, add per-task score line bf98c78

fix: enforce reward bounds (0.01–0.99) and 2 decimal precision across grader, env, and inference 3781ce7

fix: reward scores are updated to be between 0 and 1 c130122

chore: logs format update e7b5e0d

feat: implement WhyDidItFailState for full OpenEnv state compliance ff8ce5f

chore: clean up all the unnecessary comments afa4b9d

feat: openEnv playground UI basic implementation f7c4516

fix: normalize underfitting gradient norms and guard vague-answer penalty 909dfde

feat: add judge fallback 53f3a58

refactor: moving llm judge inside server dir 149177d

feat: add playground static file serving aac6b30

fix: harden label rules to prevent missing_regularization misfires 3eeca00

feat(scenarios): add real gradient norms and improve scenario discriminability a22393e

feat: upgrading the inspect feedback function d29cfdb

feat: updating the logs with relevant model names for improving score function efficiency 88c0fc2

feat: updating the evidence scoring function a91fb6a

feat: adding steps count logic to encourage the agent explore more 17a43d0

feat: max step limit 1288c52

feat: fix suggestion is required and not providing fix causes penalty c6888af

feat: ordering_bonus function implementation 236cf5b

feat(grade): inspected is upgraded to inspected_order. It rewards steps taken in order a818334

refactor: Aligning the env with the new grade function e216a2f

feat: adding more scenarios a73576c

feat: grade function refactor and additional features 740ac53

fix: add state() method 3613ecf

feat: init graders and implement grade_easy() in env 243b472

chore: import statement refactor 04666da

fix: seed and episode_id in reset() a0518e7

fix(grade): keyword matching and requires_fix flag for diagnosis scoring 9f554a9

feat: initial environment setup 572e42a

refactor: WhyDidItFailEnvironment class name d08def9

refactor: WhyDidItFailAction and WhyDidItFailObservation classes 87037e2

feat: Initialised sceanrios a80823e

Initial commit b37875f

chore: update doc string

6b279f6

fix: clamp all rewards and scores to [0.10, 0.90]

d3b224f

fix: clamp all score paths to (0.01, 0.99), fix reward field name, add per-task score line

bf98c78

fix: enforce reward bounds (0.01–0.99) and 2 decimal precision across grader, env, and inference

3781ce7

fix: reward scores are updated to be between 0 and 1

c130122

chore: logs format update

e7b5e0d

feat: implement WhyDidItFailState for full OpenEnv state compliance

ff8ce5f

chore: clean up all the unnecessary comments

afa4b9d

feat: openEnv playground UI basic implementation

f7c4516

fix: normalize underfitting gradient norms and guard vague-answer penalty

909dfde

feat: add judge fallback

53f3a58

refactor: moving llm judge inside server dir

149177d

feat: add playground static file serving

aac6b30

fix: harden label rules to prevent missing_regularization misfires

3eeca00

feat(scenarios): add real gradient norms and improve scenario discriminability

a22393e

feat: upgrading the inspect feedback function

d29cfdb

feat: updating the logs with relevant model names for improving score function efficiency

88c0fc2

feat: updating the evidence scoring function

a91fb6a

feat: adding steps count logic to encourage the agent explore more

17a43d0

feat: max step limit

1288c52

feat: fix suggestion is required and not providing fix causes penalty

c6888af

feat: ordering_bonus function implementation

236cf5b

feat(grade): inspected is upgraded to inspected_order. It rewards steps taken in order

a818334

refactor: Aligning the env with the new grade function

e216a2f

feat: adding more scenarios

a73576c

feat: grade function refactor and additional features

740ac53

fix: add state() method

3613ecf

feat: init graders and implement grade_easy() in env

243b472

chore: import statement refactor

04666da

fix: seed and episode_id in reset()

a0518e7

fix(grade): keyword matching and requires_fix flag for diagnosis scoring

9f554a9

feat: initial environment setup

572e42a

refactor: WhyDidItFailEnvironment class name

d08def9

refactor: WhyDidItFailAction and WhyDidItFailObservation classes

87037e2

feat: Initialised sceanrios

a80823e

Initial commit

b37875f