Phase 9: baseline eval harness (heuristic + LLM agents) + tests 4d2f869 yashash04 commited on 21 days ago