Spaces:

messili
/

polyglot-alpha

Sleeping

App Files Files Community

polyglot-alpha / outputs /readme_v4_diff.md

licaomeng

deploy: main@8970ffb → HF Spaces (2026-05-27T05:19Z)

88d2f2a 12 days ago

preview code

raw

history blame contribute delete

4.54 kB

	# README v3 → v4 — Change Log

	Date: 2026-05-26
	v3: 653 lines · 6284 words
	v4: 699 lines · 7162 words
	Delta: +46 lines · +878 words (well under the +150-line / 800-line budget)

	The v4 pass incorporates the overnight 14-sub-agent stress loop findings
	([`outputs/MASTER_REPORT.md`](./MASTER_REPORT.md), [`outputs/BUG_BACKLOG.md`](./BUG_BACKLOG.md))
	while preserving v3's Medium-quality voice.

	---

	## What changed

	### 1. Badges (top of README)

	- Tests badge: `149 Py + 30 Sol + 15 FE` → `219 Py + 36 Jest + 30 Foundry` (real counts post-overnight)
	- Slither badge: clarifies "0 High / 0 Medium" applies to first-party code (OZ `Math.sol` library noise excluded)
	- New Smoke badge: `Smoke 10/12 GREEN` linking to MASTER_REPORT.md
	- Tests + Slither badges now point at `MASTER_REPORT.md` instead of `final_audit_summary.md` (more current)

	### 2. TL;DR paragraph 3

	Added: builder code `0xa934…beb1` registered, Alchemy Polygon RPC bound,
	overnight stress loop reference, real-stack coverage figure (~85%),
	smoke 10/12 GREEN.

	### 3. Real vs Mock: Honest Accounting

	- Phase 1 chain glue + dispatch rows: `PHASE 1 (landing)` → `REAL (smoke verified)`
	- New row: Alchemy Polygon RPC binding marked REAL with median latency
	- Coverage estimate: 25–30% → ~85% with provenance (Phase 1 + overnight verification)
	- WARNING callout rewritten: the gap now is BLEU/COMET reference-lookup wiring
	(HIGH-1 in BUG_BACKLOG), not chain glue. MQM (the most informative of the three)
	is real.

	### 4. The Numbers table

	Replaced "target 60–75 s" with measured values from `perf_benchmark.md`:

	\| Row \| v3 \| v4 \|
	\|-----\|-----\|-----\|
	\| Lifecycle wall clock \| 60–75 s target \| p50 65.87 s measured, p95 ≥180 s on stalls \|
	\| API p95 \| — \| 8.7 – 29.3 ms (`/events`, `/leaderboard`, `/events/{id}`) \|
	\| Backend cold start \| — \| 1.65 s + Next.js FCP 90–760 ms \|
	\| FAISS lookup median \| — \| 16.07 ms vs 100 ms budget \|
	\| Arc RPC eth_blockNumber \| — \| p50 590.6 ms · p95 828.3 ms \|
	\| Test suites total \| — \| 285 pass (219 + 36 + 30) \|
	\| Slither verdict \| post-hardening \| post-hardening first-party clarified \|

	### 5. NEW SECTION — "Stress Tested Overnight (2026-05-26, 04:30–08:00 SGT)"

	Position: before "Audit + Hardening Pass" (chronologically the newer event).

	Covers:
	- 14 sub-agents launched in 3 waves over 3.5 hours
	- 600+ check items across 9 domains
	- 47 bugs catalogued, 27 auto-fixed
	- Before/after table on 9 surfaces (smoke 4→10/12, mobile 47%→81%, etc.)
	- 121 screenshots produced
	- Demo readiness verdict: GREEN mechanism / YELLOW market with explanation
	- Pointers to MASTER_REPORT.md and BUG_BACKLOG.md

	Voice: "we put it through a stress test loop" — not boastful, just earned.

	### 6. Audit + Hardening Pass

	One-line transition update: "Before the overnight stress loop, an earlier
	8-audit parallel pass ran…" — chronology now reads cleanly.

	### 7. Roadmap

	Added a new "Production hardening" phase (1–4 weeks post-ship) capturing
	the three concrete Agent H production recommendations:
	- `BackgroundTasks` migration for `/trigger/event` (BLOCKER)
	- LLM timeout + circuit breaker around the 4-provider fan-out
	- `gunicorn --workers 4` + reverse proxy
	- BLEU/COMET reference-lookup wiring (HIGH-1)
	- Firefox SSE CORS fix (HIGH-2)

	Plus a follow-up paragraph explaining what each is, what surfaced it
	(Agent H perf benchmark), and that they are roughly day-of-work fixes,
	not architecture changes.

	### 8. Arc capabilities table

	Added Alchemy Polygon RPC row with app id `ngx37mo60qae6ror` and median latency.

	### 9. "The Numbers" intro polish

	Opener now lists the concrete corpus + event + bid + submission + test counts
	that back the "real data, not just simulated" claim.

	---

	## What stayed the same (deliberate)

	- 6 Mermaid diagrams (none added, none removed)
	- All 24 unique §5.X cross-reference anchors still resolve
	- Section order unchanged except the new "Stress Tested Overnight" insertion
	- Voice / tone preserved from v3 (Medium-quality narrative)
	- §5.30 honest-scope discipline maintained: no claim about proof-of-market
	- Mechanism design defaults table (locked parameters) untouched
	- License + Closing Thesis untouched

	---

	## Anchor-resolve verification

	Unique §5.X anchors in v4 (24 distinct):
	50, 502, 503, 510, 515, 518, 521, 522, 527, 528, 530, 540, 5402, 541, 542,
	543, 544, 546, 547, 548, 55, 551, 56, 57

	All identical to v3 anchor set. No anchors added or dropped.