Commit History

chore: removing logs
1dce05c

samrat-rm commited on

fix: remove exception
43c1c2a

samrat-rm commited on

feat: rewards upgrade
61e83f1

samrat-rm commited on

fix: implementing strict prompt conditions for scores/reward to be in 0.0–1.0 range
89b370c

samrat-rm commited on

feat: updating prompt and reward consdition
f58e721

samrat-rm commited on

chore: updating the [END] log
87f9568

samrat-rm commited on

feat: LLM agent model change
3ef6b97

samrat-rm commited on

chore: update doc string
6b279f6

samrat-rm commited on

fix: clamp all rewards and scores to [0.10, 0.90]
d3b224f

samrat-rm Claude Sonnet 4.6 commited on

fix: clamp all score paths to (0.01, 0.99), fix reward field name, add per-task score line
bf98c78

samrat-rm commited on

fix: harden grader prompts to prevent out-of-range scores
7a56cd1

samrat-rm commited on

fix: reduced the number of scenario in inference
196955c

samrat-rm commited on

fix: logs in inference
c348367

samrat-rm commited on

chore: doc string update and remove unused import
0252dc5

samrat-rm commited on

fic: score condition
d933934

samrat-rm commited on

fix: score format and range
a583a04

samrat-rm commited on

fix: score range
26f0b41

samrat-rm commited on

fix: enforce reward bounds (0.01–0.99) and 2 decimal precision across grader, env, and inference
3781ce7

samrat-rm commited on

fix: reward scores are updated to be between 0 and 1
c130122

samrat-rm commited on

chore: code cleanup
2014a9f

samrat-rm commited on

chore: logs format update
e7b5e0d

samrat-rm commited on

chore: updating logs
faf4fb8

samrat-rm commited on

fix: rewards field shows single composite score instead of step CSV
f74015b

samrat-rm commited on

chore: 2 sceanrios per difficulty - inference
a884ccb

samrat-rm commited on

chore: only one scenario is run per difficulty
d6bb519

samrat-rm commited on

chore: restrict stdout to START/STEP/END for eval compliance
87b840b

samrat-rm commited on

fix: cast fallback action_type to Literal for Pylance compliance and remove image from repo root
26630c7

samrat-rm commited on

docs: updating readme with state changes and test
f0681d9

samrat-rm commited on

feat: implement WhyDidItFailState for full OpenEnv state compliance
ff8ce5f

samrat-rm commited on

docs: Update README.md
15f091b

samrat-rm commited on

docs: updating the readme with AI usage disclosure section
608b10a

samrat-rm commited on

docs: adding detailed docs for agent_prompt , grade and scenarios
6338fc0

samrat-rm commited on

docs: expand setup section, fix factual errors, add features summary
e6bf1cd

samrat-rm commited on

feat: updating the readme
bece8d8

samrat-rm commited on

fix: minor refactor of env class name
8e889bd

samrat-rm commited on

chore: clean up all the unnecessary comments
afa4b9d

samrat-rm commited on

Merge pull request #2 from samrat-rm/feat/llm-judge
696784a

samrat-rm commited on

chore: updating logs
051a1af

samrat-rm commited on

feat: openEnv playground UI basic implementation
f7c4516

samrat-rm commited on

feat: updated the yaml file with tasks for evaluation
c6913d5

samrat-rm commited on

fix: comply with openenv stdout spec, preserve inspection data in history, sharpen medium-tier label rules
77f9568

samrat-rm commited on

fix: error handling for episode run loop
aa1c27d

samrat-rm commited on

fix: updating the logs to align with the evaluation standards
2ae2b18

samrat-rm commited on

fix: error handling for HF_TOKEN env
5ca33a3

samrat-rm commited on

feat: update the readme.md
8f1e681

samrat-rm commited on

fix: tighten label rules for underfitting, overfitting, and vanishing gradients
25fff92

samrat-rm commited on

fix: normalize underfitting gradient norms and guard vague-answer penalty
909dfde

samrat-rm commited on

feat: add judge fallback
53f3a58

samrat-rm commited on

refactor: moving llm judge inside server dir
149177d

samrat-rm commited on

feat: add playground static file serving
aac6b30

samrat-rm commited on