add compare_agents.py: 4-way benchmark (Random/Heuristic/SFT/GRPO) 2968ead Running israaaML Claude Sonnet 4.6 commited on 1 day ago