File size: 6,348 Bytes
88d2f2a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
# W10 verification harness β€” pre-built scripts

The four scripts below are the W10-PREP deliverables: they let four W10
sub-agents kick off in parallel the moment W9-A (JudgePanel attestation)
and W9-E (final wiring) land. **Do not run these scripts before W9-A and
W9-E are merged** β€” the chain-consistency sweep relies on the
`events.judges_attestation_tx` column W9-A introduces, and the UI
regression / stress audit scripts assume the W9-A & W9-E UX surfaces are
present on `/events/*`.

All four scripts are idempotent + safe to re-run.

---

## 1. `scripts/w10/chain_consistency_sweep.py`

**Purpose.** Sweep the latest N `SUBMITTED` events through
`scripts/verify_chain_consistency.py` (NOT modified β€” wrapped) and
aggregate the per-phase PASS/FAIL/SKIP counts. For every FAIL the
verifier diff is included verbatim plus a heuristic "root cause" line.

**Invocation:**

```bash
.venv/bin/python scripts/w10/chain_consistency_sweep.py             # 20 events, any mode
.venv/bin/python scripts/w10/chain_consistency_sweep.py --limit 30 --mode live
.venv/bin/python scripts/w10/chain_consistency_sweep.py --mode mock --out /tmp/w10-chain-mock.md
```

**Output:** `/tmp/w10-chain-sweep.md` (override with `--out`).
**Console:** per-phase tally summary at the end.
**Exit code:** 0 if every checked phase is PASS, 1 otherwise.

**Expected runtime:** ~6–10 s per event (most of that is RPC). 20 events
β†’ ~2–3 min on the Arc testnet RPC.

**Dependencies:** Python venv at `.venv/`, `web3` (already pinned in
`pyproject.toml`), `.env` populated with the contract addresses
(`TRANSLATION_AUCTION_ADDRESS`, `JUDGE_PANEL_ADDRESS`, …). All present.

---

## 2. `ui/scripts/w10/ui_regression_sweep.mjs`

**Purpose.** Re-verify the 13 W3-regression points (from
`scripts/wave3_regression.mjs`) against the current UI in BOTH `mode=live`
and `mode=mock`. Aggregates console.errors / 4xx-5xx / 429 across both
runs.

**Invocation:**

```bash
node ui/scripts/w10/ui_regression_sweep.mjs
# override target events:
W10_EVENT_LIVE=214 W10_EVENT_MOCK=213 node ui/scripts/w10/ui_regression_sweep.mjs
```

**Output:** `/tmp/w10-ui-regression.md` (per-fix PASS/FAIL matrix +
console / network sections).
**Exit code:** 0 if every check passes in both modes, 1 if any fails.

**Expected runtime:** ~90–120 s (13 checks Γ— 2 modes Γ— page navigation).

**Dependencies:** UI on `:3001` and API on `:8000`. Playwright + Chromium
already installed under `ui/node_modules`. No new deps.

**Sample matrix row (expected when clean):**

```
| R1 | Phase 4 judge panel renders 11 judges | PASS | PASS |
| R9 | SSE rate-limit β€” 5 rapid reloads β†’ 0 Γ— 429 | PASS | PASS |
```

---

## 3. `scripts/w10/test_suite_runner.sh`

**Purpose.** Run pytest + jest + `tsc --noEmit` one after the other,
tail-trim each log into `/tmp`, then print a summary block.

**Invocation:**

```bash
bash scripts/w10/test_suite_runner.sh
bash scripts/w10/test_suite_runner.sh --no-jest          # skip UI suite
bash scripts/w10/test_suite_runner.sh --no-pytest        # skip backend
```

**Outputs:**

- `/tmp/w10-pytest.log` β€” last 80 lines of pytest stdout
- `/tmp/w10-jest.log` β€” last 60 lines of jest output
- `/tmp/w10-tsc.log` β€” last 40 lines of `tsc --noEmit`
- `/tmp/w10-test-suite.summary` β€” distilled pass/fail counts

**Exit code:** 0 if all three suites pass and `tsc` reports 0 errors;
non-zero otherwise.

**Expected runtime:** pytest β‰ˆ 60–90 s, jest β‰ˆ 30–45 s, tsc β‰ˆ 20–30 s β†’
total β‰ˆ 2–3 min.

**Dependencies:** `.venv/bin/python` (fallback: `python3`),
`ui/node_modules/.bin/jest` and `tsc` (already installed).

---

## 4. `ui/scripts/w10/concurrent_stress_audit.mjs`

**Purpose.** Trigger N mock + M live events in parallel, wait for each to
reach a terminal status, then run `scripts/verify_chain_consistency.py`
on each. Checks three system-wide invariants:

| # | invariant |
|---|-----------|
| I1 | leaderboard NOT polluted by mock-only placeholder addrs |
| I2 | every live event has a real on-chain trace (no `0xsim_…`) |
| I3 | SSE never emits a 429 under load |

**Invocation:**

```bash
node ui/scripts/w10/concurrent_stress_audit.mjs --mock 5 --live 3
node ui/scripts/w10/concurrent_stress_audit.mjs --mock 5 --live 0   # mock-only dry-run
```

**Output:** `/tmp/w10-stress-audit.md` (per-scenario table + invariants
+ raw trigger response bodies for debugging).

**Exit code:** 0 if every scenario AND every invariant pass; 1 on any
violation; 2 on fatal error.

**Expected runtime:** mock terminal β‰ˆ 15–25 s; live terminal β‰ˆ 60–90 s on
Arc testnet. Wall-clock β‰ˆ 2–3 min for 5+3 because triggers fire in
parallel.

**Dependencies:** Backend at `:8000` + UI at `:3001` + Arc testnet
funded faucet keys in `.env` (live triggers consume gas). Playwright
+ Chromium already installed.

---

## Pre-flight checklist

Before any W10 sub-agent runs these scripts, confirm:

- [ ] backend up: `curl -fs http://localhost:8000/health` returns 200
- [ ] UI up: `curl -fs http://localhost:3001/` returns 200
- [ ] DB writable: `ls polyglot_alpha.db*` (WAL/SHM files OK)
- [ ] `.env` has `TRANSLATION_AUCTION_ADDRESS`, `QUESTION_REGISTRY_ADDRESS`,
      `BUILDER_FEE_ROUTER_ADDRESS`, `REPUTATION_REGISTRY_ADDRESS`,
      `JUDGE_PANEL_ADDRESS`
- [ ] Arc faucet keys in `.env` if `--live N` with `N > 0`
- [ ] W9-A column `events.judges_attestation_tx` present
      (`sqlite3 polyglot_alpha.db "PRAGMA table_info(events)" | grep judges`)
- [ ] W9-E surfaces deployed (judge panel + reputation widgets on `/events/*`)

---

## How the four sub-agents map onto these scripts

| W10 sub-agent | script | output to consume |
|---------------|--------|-------------------|
| chain-consistency | `scripts/w10/chain_consistency_sweep.py --limit 20` | `/tmp/w10-chain-sweep.md` |
| ui-regression | `node ui/scripts/w10/ui_regression_sweep.mjs` | `/tmp/w10-ui-regression.md` |
| test-suite | `bash scripts/w10/test_suite_runner.sh` | `/tmp/w10-test-suite.summary` |
| stress-audit | `node ui/scripts/w10/concurrent_stress_audit.mjs --mock 5 --live 3` | `/tmp/w10-stress-audit.md` |

The four sub-agents can be fanned out fully in parallel β€” none of them
mutate the others' state (the verifier is read-only; the test runner only
writes to `/tmp/`; stress-audit creates *new* events but reads only the
ones it itself triggered).