Spaces:
Sleeping
Sleeping
Commit History
fix: remove exception 43c1c2a
feat: rewards upgrade 61e83f1
fix: implementing strict prompt conditions for scores/reward to be in 0.0–1.0 range 89b370c
feat: updating prompt and reward consdition f58e721
chore: updating the [END] log 87f9568
feat: LLM agent model change 3ef6b97
chore: update doc string 6b279f6
fix: clamp all rewards and scores to [0.10, 0.90] d3b224f
fix: clamp all score paths to (0.01, 0.99), fix reward field name, add per-task score line bf98c78
fix: harden grader prompts to prevent out-of-range scores 7a56cd1
fix: reduced the number of scenario in inference 196955c
fix: logs in inference c348367
chore: doc string update and remove unused import 0252dc5
fic: score condition d933934
fix: score format and range a583a04
fix: score range 26f0b41
fix: enforce reward bounds (0.01–0.99) and 2 decimal precision across grader, env, and inference 3781ce7
fix: reward scores are updated to be between 0 and 1 c130122
chore: code cleanup 2014a9f
chore: logs format update e7b5e0d
chore: updating logs faf4fb8
fix: rewards field shows single composite score instead of step CSV f74015b
chore: 2 sceanrios per difficulty - inference a884ccb
chore: only one scenario is run per difficulty d6bb519
chore: restrict stdout to START/STEP/END for eval compliance 87b840b
fix: cast fallback action_type to Literal for Pylance compliance and remove image from repo root 26630c7
docs: updating readme with state changes and test f0681d9
feat: implement WhyDidItFailState for full OpenEnv state compliance ff8ce5f
docs: Update README.md 15f091b
samrat-rm commited on
docs: updating the readme with AI usage disclosure section 608b10a
docs: adding detailed docs for agent_prompt , grade and scenarios 6338fc0
docs: expand setup section, fix factual errors, add features summary e6bf1cd
feat: updating the readme bece8d8
fix: minor refactor of env class name 8e889bd
chore: clean up all the unnecessary comments afa4b9d
Merge pull request #2 from samrat-rm/feat/llm-judge 696784a
samrat-rm commited on