Fix grader import path: use root-level graders module instead of server.graders 220acb1 padmapriyagosakan commited on Apr 12
Add score:0.5 to all YAML tasks, make graders callable without args, fix health status f4fa44f padmapriyagosakan commited on Apr 12
Iteration_4: expand task bank 20→30, add variants for all tasks, fix baseline policy for chain-gated CRIT tasks ffea7f4 padmapriyagosakan commited on Apr 2
feat: enforce investigation discipline + fix easy-task grading + add investigation_hints 622e841 padmapriyagosakan commited on Mar 30
feat: reproducible baseline — fixed seed, correct_action in table, accuracy summary 9ec66a4 padmapriyagosakan commited on Mar 30
fix: per-task reward display uses weighted_reward key from grader 2cf9fa0 padmapriyagosakan commited on Mar 30
feat: run LLM inference baseline + fix SSL and loop guard in inference.py c0df82b padmapriyagosakan commited on Mar 30
docs: fix README gaps - complete action/observation tables and task descriptions 36cf9b5 padmapriyagosakan commited on Mar 30
feat: Iteration_2 — trajectory reward shaping, episode jitter, flag identification 9c003f0 padmapriyagosakan commited on Mar 27
chore: pre-iteration-1 snapshot — 75/75 e2e passing, baseline bug fixed 0f139ff padmapriyagosakan commited on Mar 27