| <!doctype html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1" /> |
| <title>The browser is the real agent benchmark</title> |
| <meta name="description" content="A real-world browser automation checklist for AI agents: auth, iframes, Shadow DOM, rich-text editors, and receipts." /> |
| <style> |
| :root { color-scheme: dark; --bg: #090b12; --fg: #eef2ff; --muted: #a8b3cf; --card: #111827; --line: #263044; --accent: #8b5cf6; --accent2: #38bdf8; } |
| * { box-sizing: border-box; } |
| body { margin: 0; font-family: ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif; background: radial-gradient(circle at top left, rgba(139,92,246,.22), transparent 35%), radial-gradient(circle at bottom right, rgba(56,189,248,.16), transparent 30%), var(--bg); color: var(--fg); line-height: 1.6; } |
| main { max-width: 920px; margin: 0 auto; padding: 64px 24px 80px; } |
| .eyebrow { color: var(--accent2); text-transform: uppercase; letter-spacing: .18em; font-size: .8rem; font-weight: 700; } |
| h1 { font-size: clamp(2.3rem, 6vw, 5rem); line-height: .96; margin: 18px 0 22px; letter-spacing: -.05em; } |
| .lede { font-size: clamp(1.1rem, 2vw, 1.35rem); color: var(--muted); max-width: 720px; } |
| .actions { display: flex; flex-wrap: wrap; gap: 12px; margin: 30px 0 46px; } |
| a.button { color: white; text-decoration: none; border: 1px solid var(--line); padding: 12px 16px; border-radius: 999px; background: rgba(255,255,255,.05); } |
| a.primary { background: linear-gradient(135deg, var(--accent), var(--accent2)); border: 0; color: #06111f; font-weight: 800; } |
| section { background: rgba(17,24,39,.72); border: 1px solid var(--line); border-radius: 22px; padding: 24px; margin: 18px 0; backdrop-filter: blur(12px); } |
| h2 { margin-top: 0; letter-spacing: -.02em; } |
| li { margin: 8px 0; } |
| code { background: rgba(255,255,255,.08); padding: 2px 6px; border-radius: 6px; } |
| footer { color: var(--muted); margin-top: 34px; font-size: .95rem; } |
| </style> |
| </head> |
| <body> |
| <main> |
| <div class="eyebrow">Real-world agent evals</div> |
| <h1>The browser is the real agent benchmark</h1> |
| <p class="lede">Most agent demos skip the part that breaks in production: real websites. Auth flows, iframes, Shadow DOM, rich-text editors, redirects, passkeys, and verification receipts.</p> |
| <div class="actions"> |
| <a class="button primary" href="https://allowayai.substack.com/p/the-browser-is-the-real-agent-benchmark">Read the full write-up</a> |
| <a class="button" href="https://github.com/ianalloway">GitHub</a> |
| <a class="button" href="https://ianalloway.xyz">Portfolio</a> |
| </div> |
| <section> |
| <h2>What the benchmark cares about</h2> |
| <ul> |
| <li>Persistent authenticated browser profiles.</li> |
| <li>MFA and CAPTCHA handoff without leaking credentials.</li> |
| <li>Iframe switching and Shadow DOM traversal.</li> |
| <li>ProseMirror and Quill rich-text editors.</li> |
| <li>Passkey/WebAuthn prompts that sit outside the DOM.</li> |
| <li>Responsive layout shifts and hidden controls.</li> |
| <li>Final-state verification instead of trusting a process exit.</li> |
| </ul> |
| </section> |
| <section> |
| <h2>Receipts over vibes</h2> |
| <p>If an agent says it posted, submitted, updated, or paid for something, it needs a receipt: a final URL, a DOM state, a server response, a public page check, or a screenshot.</p> |
| <p>That is the difference between a chatbot and infrastructure.</p> |
| </section> |
| <footer>Built by Ian Alloway · public companion page for the Substack post.</footer> |
| </main> |
| </body> |
| </html> |
|
|