# scripts/ Utility scripts bundled with the Headroom repo. Most are one-off operator tools; a few are runnable as part of development workflows. ## Reproducing the reconnect storm `repro_codex_replay.py` reproduces the multi-agent Codex reconnect/retry storm against a local Headroom proxy (default `http://127.0.0.1:8787`), as described in `wiki/plans/2026-04-17-codex-proxy-runtime-analysis.md` under "Latest Correction". Use it to: - Regression-check that `/livez` stays responsive under a cold-start storm. - Empirically tune the Unit 4 pre-upstream semaphore default (`HEADROOM_ANTHROPIC_PRE_UPSTREAM_CONCURRENCY`). - Exercise the Codex WS lifecycle + Anthropic HTTP path simultaneously without needing to replay captured production traffic. ### Run ```bash # Default: 8 WS + 4 HTTP clients, 30s storm, p99 /livez must stay <= 500ms. python scripts/repro_codex_replay.py # Tighter budget, shorter run: python scripts/repro_codex_replay.py \ --url http://127.0.0.1:8787 \ --ws-clients 16 \ --anthropic-clients 8 \ --duration 60 \ --livez-threshold-ms 100 # Dump the full summary as JSON for downstream tooling: python scripts/repro_codex_replay.py --json ``` Exit code: - `0` — warmup succeeded (or was skipped), storm ran for the requested duration, and `/livez` p99 stayed under `--livez-threshold-ms`. - `1` — soft assertion failed, proxy unreachable, or unhandled exception. Proxy-unreachable is detected and reported within ~5 seconds. ### Fixtures The script loads two hand-crafted, fully synthetic JSON fixtures: - `scripts/fixtures/anthropic_replay_body.json` — shape of a large agent reconnect replay `/v1/messages?beta=true` POST body. - `scripts/fixtures/codex_response_create_frame.json` — first Codex WS frame with the `{"type": "response.create", "response": {...}}` envelope. Override via `--ws-frame-fixture` / `--anthropic-body-fixture` if you have captured traffic to replay instead. ### Interpretation - `/livez p99` under threshold means the event loop is not starved during the storm. If it rises with the semaphore unbounded (`HEADROOM_ANTHROPIC_PRE_UPSTREAM_CONCURRENCY=10000`) and drops back under the default, Unit 4's backpressure is working. - `Codex WS: opened` should equal `--ws-clients`. `response.completed` typically stays low when upstream auth isn't configured locally — the goal is handshake + relay wiring, not real upstream traffic. - `Anthropic HTTP: ok_2xx + non_2xx + timed_out + errors` should roughly equal `attempted`. Sustained non-zero `timed_out` during the storm is the failure signal the plan targets. A smoke test at `tests/test_scripts/test_repro_codex_replay_smoke.py` exercises the script against a mock FastAPI server on every PR. ## Install scripts - `install.sh` — POSIX installer. - `install.ps1` — Windows PowerShell installer. These are generated by the release pipeline; edit with care.