ianalloway's picture
Upload folder using huggingface_hub
d3ebf05 verified
Raw
History Blame Contribute Delete
3.66 kB
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>The browser is the real agent benchmark</title>
<meta name="description" content="A real-world browser automation checklist for AI agents: auth, iframes, Shadow DOM, rich-text editors, and receipts." />
<style>
:root { color-scheme: dark; --bg: #090b12; --fg: #eef2ff; --muted: #a8b3cf; --card: #111827; --line: #263044; --accent: #8b5cf6; --accent2: #38bdf8; }
* { box-sizing: border-box; }
body { margin: 0; font-family: ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif; background: radial-gradient(circle at top left, rgba(139,92,246,.22), transparent 35%), radial-gradient(circle at bottom right, rgba(56,189,248,.16), transparent 30%), var(--bg); color: var(--fg); line-height: 1.6; }
main { max-width: 920px; margin: 0 auto; padding: 64px 24px 80px; }
.eyebrow { color: var(--accent2); text-transform: uppercase; letter-spacing: .18em; font-size: .8rem; font-weight: 700; }
h1 { font-size: clamp(2.3rem, 6vw, 5rem); line-height: .96; margin: 18px 0 22px; letter-spacing: -.05em; }
.lede { font-size: clamp(1.1rem, 2vw, 1.35rem); color: var(--muted); max-width: 720px; }
.actions { display: flex; flex-wrap: wrap; gap: 12px; margin: 30px 0 46px; }
a.button { color: white; text-decoration: none; border: 1px solid var(--line); padding: 12px 16px; border-radius: 999px; background: rgba(255,255,255,.05); }
a.primary { background: linear-gradient(135deg, var(--accent), var(--accent2)); border: 0; color: #06111f; font-weight: 800; }
section { background: rgba(17,24,39,.72); border: 1px solid var(--line); border-radius: 22px; padding: 24px; margin: 18px 0; backdrop-filter: blur(12px); }
h2 { margin-top: 0; letter-spacing: -.02em; }
li { margin: 8px 0; }
code { background: rgba(255,255,255,.08); padding: 2px 6px; border-radius: 6px; }
footer { color: var(--muted); margin-top: 34px; font-size: .95rem; }
</style>
</head>
<body>
<main>
<div class="eyebrow">Real-world agent evals</div>
<h1>The browser is the real agent benchmark</h1>
<p class="lede">Most agent demos skip the part that breaks in production: real websites. Auth flows, iframes, Shadow DOM, rich-text editors, redirects, passkeys, and verification receipts.</p>
<div class="actions">
<a class="button primary" href="https://allowayai.substack.com/p/the-browser-is-the-real-agent-benchmark">Read the full write-up</a>
<a class="button" href="https://github.com/ianalloway">GitHub</a>
<a class="button" href="https://ianalloway.xyz">Portfolio</a>
</div>
<section>
<h2>What the benchmark cares about</h2>
<ul>
<li>Persistent authenticated browser profiles.</li>
<li>MFA and CAPTCHA handoff without leaking credentials.</li>
<li>Iframe switching and Shadow DOM traversal.</li>
<li>ProseMirror and Quill rich-text editors.</li>
<li>Passkey/WebAuthn prompts that sit outside the DOM.</li>
<li>Responsive layout shifts and hidden controls.</li>
<li>Final-state verification instead of trusting a process exit.</li>
</ul>
</section>
<section>
<h2>Receipts over vibes</h2>
<p>If an agent says it posted, submitted, updated, or paid for something, it needs a receipt: a final URL, a DOM state, a server response, a public page check, or a screenshot.</p>
<p>That is the difference between a chatbot and infrastructure.</p>
</section>
<footer>Built by Ian Alloway · public companion page for the Substack post.</footer>
</main>
</body>
</html>