add compare_agents.py: 4-way benchmark (Random/Heuristic/SFT/GRPO) 2968ead israaaML Claude Sonnet 4.6 commited on Mar 8
fix: sanitize numpy/pandas types in submit_solution JSON serialization 3ce0714 israaaML Claude Sonnet 4.6 commited on Mar 8
v3: benchmark results, final report, agent/eval improvements, smoke test fixes b3fc5ee israaaML Claude Sonnet 4.6 commited on Mar 8
v2: curriculum scheduling, SFT pipeline, reward redesign, agent guide 16038fc israaaML Claude Sonnet 4.6 commited on Mar 8