pinned Running Agents 1 CIBench Public Demo 📐 Replayable long-context benchmark engine (manifest-first)