ChargeBackOps / evaluation

Commit History

feat: Implement wait_for_updates action for handling delayed cases and evidence
2dedffd

mitudrudutta commited on

feat: tighten EscalationROI, add ambiguous medium case, LLM note judge wrapper
e32a33b

mitudrudutta commited on

feat: Implement multi-round dispute lifecycle with arbitration scoring and related tests
b7aa1f0

pauldebanshu19 commited on

refactor: update difficulty levels and enhance scoring rubrics in documentation and code
3149b7e

mitudrudutta commited on

Refactor evidence building and improve code readability in iso_adapter.py
37bfd28

mitudrudutta commited on

refactor: build grading on OpenEnv Rubric system
c8ebaee

mitudrudutta commited on

fix: squash inflated evidence scores for wrongly contested concedable cases
7eba019

mitudrudutta commited on

feat: harden grader to penalise shallow operational behaviour
544c8b2

mitudrudutta commited on

feat: make note grading contextual per-case harmful evidence
8d14c02

mitudrudutta commited on

feat: harden grading, expand task catalog, add episode persistence
87c40c2

mitudrudutta commited on

refactor: reorganize source files into core/, evaluation/, runners/, scenarios/ directories
3816847

mitudrudutta commited on