Spaces:

messili
/

polyglot-alpha

Running

App Files Files Community

polyglot-alpha / scripts /w10 /README.md

licaomeng

deploy: main@8970ffb → HF Spaces (2026-05-27T05:19Z)

88d2f2a 5 days ago

preview code

raw

history blame contribute delete

6.35 kB

W10 verification harness — pre-built scripts

The four scripts below are the W10-PREP deliverables: they let four W10 sub-agents kick off in parallel the moment W9-A (JudgePanel attestation) and W9-E (final wiring) land. Do not run these scripts before W9-A and W9-E are merged — the chain-consistency sweep relies on the events.judges_attestation_tx column W9-A introduces, and the UI regression / stress audit scripts assume the W9-A & W9-E UX surfaces are present on /events/*.

All four scripts are idempotent + safe to re-run.

1. `scripts/w10/chain_consistency_sweep.py`

Purpose. Sweep the latest N SUBMITTED events through scripts/verify_chain_consistency.py (NOT modified — wrapped) and aggregate the per-phase PASS/FAIL/SKIP counts. For every FAIL the verifier diff is included verbatim plus a heuristic "root cause" line.

Invocation:

.venv/bin/python scripts/w10/chain_consistency_sweep.py             # 20 events, any mode
.venv/bin/python scripts/w10/chain_consistency_sweep.py --limit 30 --mode live
.venv/bin/python scripts/w10/chain_consistency_sweep.py --mode mock --out /tmp/w10-chain-mock.md

Output: /tmp/w10-chain-sweep.md (override with --out). Console: per-phase tally summary at the end. Exit code: 0 if every checked phase is PASS, 1 otherwise.

Expected runtime: ~6–10 s per event (most of that is RPC). 20 events → ~2–3 min on the Arc testnet RPC.

Dependencies: Python venv at .venv/, web3 (already pinned in pyproject.toml), .env populated with the contract addresses (TRANSLATION_AUCTION_ADDRESS, JUDGE_PANEL_ADDRESS, …). All present.

2. `ui/scripts/w10/ui_regression_sweep.mjs`

Purpose. Re-verify the 13 W3-regression points (from scripts/wave3_regression.mjs) against the current UI in BOTH mode=live and mode=mock. Aggregates console.errors / 4xx-5xx / 429 across both runs.

Invocation:

node ui/scripts/w10/ui_regression_sweep.mjs
# override target events:
W10_EVENT_LIVE=214 W10_EVENT_MOCK=213 node ui/scripts/w10/ui_regression_sweep.mjs

Output: /tmp/w10-ui-regression.md (per-fix PASS/FAIL matrix + console / network sections). Exit code: 0 if every check passes in both modes, 1 if any fails.

Expected runtime: ~90–120 s (13 checks × 2 modes × page navigation).

Dependencies: UI on :3001 and API on :8000. Playwright + Chromium already installed under ui/node_modules. No new deps.

Sample matrix row (expected when clean):

| R1 | Phase 4 judge panel renders 11 judges | PASS | PASS |
| R9 | SSE rate-limit — 5 rapid reloads → 0 × 429 | PASS | PASS |

3. `scripts/w10/test_suite_runner.sh`

Purpose. Run pytest + jest + tsc --noEmit one after the other, tail-trim each log into /tmp, then print a summary block.

Invocation:

bash scripts/w10/test_suite_runner.sh
bash scripts/w10/test_suite_runner.sh --no-jest          # skip UI suite
bash scripts/w10/test_suite_runner.sh --no-pytest        # skip backend

Outputs:

/tmp/w10-pytest.log — last 80 lines of pytest stdout
/tmp/w10-jest.log — last 60 lines of jest output
/tmp/w10-tsc.log — last 40 lines of tsc --noEmit
/tmp/w10-test-suite.summary — distilled pass/fail counts

Exit code: 0 if all three suites pass and tsc reports 0 errors; non-zero otherwise.

Expected runtime: pytest ≈ 60–90 s, jest ≈ 30–45 s, tsc ≈ 20–30 s → total ≈ 2–3 min.

Dependencies: .venv/bin/python (fallback: python3), ui/node_modules/.bin/jest and tsc (already installed).

4. `ui/scripts/w10/concurrent_stress_audit.mjs`

Purpose. Trigger N mock + M live events in parallel, wait for each to reach a terminal status, then run scripts/verify_chain_consistency.py on each. Checks three system-wide invariants:

#	invariant
I1	leaderboard NOT polluted by mock-only placeholder addrs
I2	every live event has a real on-chain trace (no `0xsim_…`)
I3	SSE never emits a 429 under load

Invocation:

node ui/scripts/w10/concurrent_stress_audit.mjs --mock 5 --live 3
node ui/scripts/w10/concurrent_stress_audit.mjs --mock 5 --live 0   # mock-only dry-run

Output: /tmp/w10-stress-audit.md (per-scenario table + invariants

raw trigger response bodies for debugging).

Exit code: 0 if every scenario AND every invariant pass; 1 on any violation; 2 on fatal error.

Expected runtime: mock terminal ≈ 15–25 s; live terminal ≈ 60–90 s on Arc testnet. Wall-clock ≈ 2–3 min for 5+3 because triggers fire in parallel.

Dependencies: Backend at :8000 + UI at :3001 + Arc testnet funded faucet keys in .env (live triggers consume gas). Playwright

Chromium already installed.

Pre-flight checklist

Before any W10 sub-agent runs these scripts, confirm:

backend up: curl -fs http://localhost:8000/health returns 200
UI up: curl -fs http://localhost:3001/ returns 200
DB writable: ls polyglot_alpha.db* (WAL/SHM files OK)
.env has TRANSLATION_AUCTION_ADDRESS, QUESTION_REGISTRY_ADDRESS, BUILDER_FEE_ROUTER_ADDRESS, REPUTATION_REGISTRY_ADDRESS, JUDGE_PANEL_ADDRESS
Arc faucet keys in .env if --live N with N > 0
W9-A column events.judges_attestation_tx present (sqlite3 polyglot_alpha.db "PRAGMA table_info(events)" | grep judges)
W9-E surfaces deployed (judge panel + reputation widgets on /events/*)

How the four sub-agents map onto these scripts

W10 sub-agent	script	output to consume
chain-consistency	`scripts/w10/chain_consistency_sweep.py --limit 20`	`/tmp/w10-chain-sweep.md`
ui-regression	`node ui/scripts/w10/ui_regression_sweep.mjs`	`/tmp/w10-ui-regression.md`
test-suite	`bash scripts/w10/test_suite_runner.sh`	`/tmp/w10-test-suite.summary`
stress-audit	`node ui/scripts/w10/concurrent_stress_audit.mjs --mock 5 --live 3`	`/tmp/w10-stress-audit.md`

The four sub-agents can be fanned out fully in parallel — none of them mutate the others' state (the verifier is read-only; the test runner only writes to /tmp/; stress-audit creates new events but reads only the ones it itself triggered).

W10 verification harness — pre-built scripts

1. scripts/w10/chain_consistency_sweep.py

2. ui/scripts/w10/ui_regression_sweep.mjs

3. scripts/w10/test_suite_runner.sh

4. ui/scripts/w10/concurrent_stress_audit.mjs

Pre-flight checklist

How the four sub-agents map onto these scripts

1. `scripts/w10/chain_consistency_sweep.py`

2. `ui/scripts/w10/ui_regression_sweep.mjs`

3. `scripts/w10/test_suite_runner.sh`

4. `ui/scripts/w10/concurrent_stress_audit.mjs`