Fix dep_hard Counter bug, add fatal error handling, update README with 14-model benchmark 3466d21 immortalindeed commited on Apr 10
Major grading overhaul: difficulty multiplier, tighter scoring, mastery removal, precision penalties 72b3e8d immortalindeed commited on Apr 10
Fix state machine bugs and switch to average scoring for discriminative benchmarking cd5104a immortalindeed commited on Apr 10
fix(benchmark): Hardening multi-agent environment and strict score compliance 6f95f2a immortalindeed commited on Apr 9