Spaces:
Sleeping
Sleeping
File size: 6,348 Bytes
88d2f2a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | # W10 verification harness β pre-built scripts
The four scripts below are the W10-PREP deliverables: they let four W10
sub-agents kick off in parallel the moment W9-A (JudgePanel attestation)
and W9-E (final wiring) land. **Do not run these scripts before W9-A and
W9-E are merged** β the chain-consistency sweep relies on the
`events.judges_attestation_tx` column W9-A introduces, and the UI
regression / stress audit scripts assume the W9-A & W9-E UX surfaces are
present on `/events/*`.
All four scripts are idempotent + safe to re-run.
---
## 1. `scripts/w10/chain_consistency_sweep.py`
**Purpose.** Sweep the latest N `SUBMITTED` events through
`scripts/verify_chain_consistency.py` (NOT modified β wrapped) and
aggregate the per-phase PASS/FAIL/SKIP counts. For every FAIL the
verifier diff is included verbatim plus a heuristic "root cause" line.
**Invocation:**
```bash
.venv/bin/python scripts/w10/chain_consistency_sweep.py # 20 events, any mode
.venv/bin/python scripts/w10/chain_consistency_sweep.py --limit 30 --mode live
.venv/bin/python scripts/w10/chain_consistency_sweep.py --mode mock --out /tmp/w10-chain-mock.md
```
**Output:** `/tmp/w10-chain-sweep.md` (override with `--out`).
**Console:** per-phase tally summary at the end.
**Exit code:** 0 if every checked phase is PASS, 1 otherwise.
**Expected runtime:** ~6β10 s per event (most of that is RPC). 20 events
β ~2β3 min on the Arc testnet RPC.
**Dependencies:** Python venv at `.venv/`, `web3` (already pinned in
`pyproject.toml`), `.env` populated with the contract addresses
(`TRANSLATION_AUCTION_ADDRESS`, `JUDGE_PANEL_ADDRESS`, β¦). All present.
---
## 2. `ui/scripts/w10/ui_regression_sweep.mjs`
**Purpose.** Re-verify the 13 W3-regression points (from
`scripts/wave3_regression.mjs`) against the current UI in BOTH `mode=live`
and `mode=mock`. Aggregates console.errors / 4xx-5xx / 429 across both
runs.
**Invocation:**
```bash
node ui/scripts/w10/ui_regression_sweep.mjs
# override target events:
W10_EVENT_LIVE=214 W10_EVENT_MOCK=213 node ui/scripts/w10/ui_regression_sweep.mjs
```
**Output:** `/tmp/w10-ui-regression.md` (per-fix PASS/FAIL matrix +
console / network sections).
**Exit code:** 0 if every check passes in both modes, 1 if any fails.
**Expected runtime:** ~90β120 s (13 checks Γ 2 modes Γ page navigation).
**Dependencies:** UI on `:3001` and API on `:8000`. Playwright + Chromium
already installed under `ui/node_modules`. No new deps.
**Sample matrix row (expected when clean):**
```
| R1 | Phase 4 judge panel renders 11 judges | PASS | PASS |
| R9 | SSE rate-limit β 5 rapid reloads β 0 Γ 429 | PASS | PASS |
```
---
## 3. `scripts/w10/test_suite_runner.sh`
**Purpose.** Run pytest + jest + `tsc --noEmit` one after the other,
tail-trim each log into `/tmp`, then print a summary block.
**Invocation:**
```bash
bash scripts/w10/test_suite_runner.sh
bash scripts/w10/test_suite_runner.sh --no-jest # skip UI suite
bash scripts/w10/test_suite_runner.sh --no-pytest # skip backend
```
**Outputs:**
- `/tmp/w10-pytest.log` β last 80 lines of pytest stdout
- `/tmp/w10-jest.log` β last 60 lines of jest output
- `/tmp/w10-tsc.log` β last 40 lines of `tsc --noEmit`
- `/tmp/w10-test-suite.summary` β distilled pass/fail counts
**Exit code:** 0 if all three suites pass and `tsc` reports 0 errors;
non-zero otherwise.
**Expected runtime:** pytest β 60β90 s, jest β 30β45 s, tsc β 20β30 s β
total β 2β3 min.
**Dependencies:** `.venv/bin/python` (fallback: `python3`),
`ui/node_modules/.bin/jest` and `tsc` (already installed).
---
## 4. `ui/scripts/w10/concurrent_stress_audit.mjs`
**Purpose.** Trigger N mock + M live events in parallel, wait for each to
reach a terminal status, then run `scripts/verify_chain_consistency.py`
on each. Checks three system-wide invariants:
| # | invariant |
|---|-----------|
| I1 | leaderboard NOT polluted by mock-only placeholder addrs |
| I2 | every live event has a real on-chain trace (no `0xsim_β¦`) |
| I3 | SSE never emits a 429 under load |
**Invocation:**
```bash
node ui/scripts/w10/concurrent_stress_audit.mjs --mock 5 --live 3
node ui/scripts/w10/concurrent_stress_audit.mjs --mock 5 --live 0 # mock-only dry-run
```
**Output:** `/tmp/w10-stress-audit.md` (per-scenario table + invariants
+ raw trigger response bodies for debugging).
**Exit code:** 0 if every scenario AND every invariant pass; 1 on any
violation; 2 on fatal error.
**Expected runtime:** mock terminal β 15β25 s; live terminal β 60β90 s on
Arc testnet. Wall-clock β 2β3 min for 5+3 because triggers fire in
parallel.
**Dependencies:** Backend at `:8000` + UI at `:3001` + Arc testnet
funded faucet keys in `.env` (live triggers consume gas). Playwright
+ Chromium already installed.
---
## Pre-flight checklist
Before any W10 sub-agent runs these scripts, confirm:
- [ ] backend up: `curl -fs http://localhost:8000/health` returns 200
- [ ] UI up: `curl -fs http://localhost:3001/` returns 200
- [ ] DB writable: `ls polyglot_alpha.db*` (WAL/SHM files OK)
- [ ] `.env` has `TRANSLATION_AUCTION_ADDRESS`, `QUESTION_REGISTRY_ADDRESS`,
`BUILDER_FEE_ROUTER_ADDRESS`, `REPUTATION_REGISTRY_ADDRESS`,
`JUDGE_PANEL_ADDRESS`
- [ ] Arc faucet keys in `.env` if `--live N` with `N > 0`
- [ ] W9-A column `events.judges_attestation_tx` present
(`sqlite3 polyglot_alpha.db "PRAGMA table_info(events)" | grep judges`)
- [ ] W9-E surfaces deployed (judge panel + reputation widgets on `/events/*`)
---
## How the four sub-agents map onto these scripts
| W10 sub-agent | script | output to consume |
|---------------|--------|-------------------|
| chain-consistency | `scripts/w10/chain_consistency_sweep.py --limit 20` | `/tmp/w10-chain-sweep.md` |
| ui-regression | `node ui/scripts/w10/ui_regression_sweep.mjs` | `/tmp/w10-ui-regression.md` |
| test-suite | `bash scripts/w10/test_suite_runner.sh` | `/tmp/w10-test-suite.summary` |
| stress-audit | `node ui/scripts/w10/concurrent_stress_audit.mjs --mock 5 --live 3` | `/tmp/w10-stress-audit.md` |
The four sub-agents can be fanned out fully in parallel β none of them
mutate the others' state (the verifier is read-only; the test runner only
writes to `/tmp/`; stress-audit creates *new* events but reads only the
ones it itself triggered).
|