add compare_agents.py: 4-way benchmark (Random/Heuristic/SFT/GRPO) 2968ead Running israaaML Claude Sonnet 4.6 commited on 1 day ago
fix: sanitize numpy/pandas types in submit_solution JSON serialization 3ce0714 israaaML Claude Sonnet 4.6 commited on 1 day ago
v3: benchmark results, final report, agent/eval improvements, smoke test fixes b3fc5ee israaaML Claude Sonnet 4.6 commited on 1 day ago
v2: curriculum scheduling, SFT pipeline, reward redesign, agent guide 16038fc israaaML Claude Sonnet 4.6 commited on 2 days ago