Simulate cross-session coding tasks for LLM agents
LLM learns to fix data, not models — GRPO RL env.