feat: Implement wait_for_updates action for handling delayed cases and evidence 2dedffd mitudrudutta commited on Apr 23
feat: tighten EscalationROI, add ambiguous medium case, LLM note judge wrapper e32a33b mitudrudutta commited on Apr 19
Add training notebook and benchmark runner for ChargebackOps bd00c06 pauldebanshu19 commited on Apr 19
refactor: tighten rubric discrimination + LLM path + add running doc 0054f7f mitudrudutta commited on Apr 15
Refactor evidence building and improve code readability in iso_adapter.py 37bfd28 mitudrudutta commited on Apr 12
fix: add [START]/[STEP]/[END] structured output to inference.py 388e3b8 mitudrudutta commited on Apr 7
feat: add OpenEnv overview documentation and enhance harmful evidence detection keywords 64cb3ce mitudrudutta commited on Mar 30
feat: harden grading, expand task catalog, add episode persistence 87c40c2 mitudrudutta commited on Mar 30
refactor: reorganize source files into core/, evaluation/, runners/, scenarios/ directories 3816847 mitudrudutta commited on Mar 29