Spaces:
Running
Running
| # scripts/ | |
| Utility scripts bundled with the Headroom repo. Most are one-off operator | |
| tools; a few are runnable as part of development workflows. | |
| ## Reproducing the reconnect storm | |
| `repro_codex_replay.py` reproduces the multi-agent Codex reconnect/retry storm | |
| against a local Headroom proxy (default `http://127.0.0.1:8787`), as described | |
| in `wiki/plans/2026-04-17-codex-proxy-runtime-analysis.md` under "Latest | |
| Correction". Use it to: | |
| - Regression-check that `/livez` stays responsive under a cold-start storm. | |
| - Empirically tune the Unit 4 pre-upstream semaphore default | |
| (`HEADROOM_ANTHROPIC_PRE_UPSTREAM_CONCURRENCY`). | |
| - Exercise the Codex WS lifecycle + Anthropic HTTP path simultaneously | |
| without needing to replay captured production traffic. | |
| ### Run | |
| ```bash | |
| # Default: 8 WS + 4 HTTP clients, 30s storm, p99 /livez must stay <= 500ms. | |
| python scripts/repro_codex_replay.py | |
| # Tighter budget, shorter run: | |
| python scripts/repro_codex_replay.py \ | |
| --url http://127.0.0.1:8787 \ | |
| --ws-clients 16 \ | |
| --anthropic-clients 8 \ | |
| --duration 60 \ | |
| --livez-threshold-ms 100 | |
| # Dump the full summary as JSON for downstream tooling: | |
| python scripts/repro_codex_replay.py --json | |
| ``` | |
| Exit code: | |
| - `0` β warmup succeeded (or was skipped), storm ran for the requested | |
| duration, and `/livez` p99 stayed under `--livez-threshold-ms`. | |
| - `1` β soft assertion failed, proxy unreachable, or unhandled exception. | |
| Proxy-unreachable is detected and reported within ~5 seconds. | |
| ### Fixtures | |
| The script loads two hand-crafted, fully synthetic JSON fixtures: | |
| - `scripts/fixtures/anthropic_replay_body.json` β shape of a large agent | |
| reconnect replay `/v1/messages?beta=true` POST body. | |
| - `scripts/fixtures/codex_response_create_frame.json` β first Codex WS frame | |
| with the `{"type": "response.create", "response": {...}}` envelope. | |
| Override via `--ws-frame-fixture` / `--anthropic-body-fixture` if you have | |
| captured traffic to replay instead. | |
| ### Interpretation | |
| - `/livez p99` under threshold means the event loop is not starved during the | |
| storm. If it rises with the semaphore unbounded | |
| (`HEADROOM_ANTHROPIC_PRE_UPSTREAM_CONCURRENCY=10000`) and drops back under | |
| the default, Unit 4's backpressure is working. | |
| - `Codex WS: opened` should equal `--ws-clients`. `response.completed` | |
| typically stays low when upstream auth isn't configured locally β the goal | |
| is handshake + relay wiring, not real upstream traffic. | |
| - `Anthropic HTTP: ok_2xx + non_2xx + timed_out + errors` should roughly equal | |
| `attempted`. Sustained non-zero `timed_out` during the storm is the failure | |
| signal the plan targets. | |
| A smoke test at `tests/test_scripts/test_repro_codex_replay_smoke.py` | |
| exercises the script against a mock FastAPI server on every PR. | |
| ## Install scripts | |
| - `install.sh` β POSIX installer. | |
| - `install.ps1` β Windows PowerShell installer. | |
| These are generated by the release pipeline; edit with care. | |