Spaces:

ST-WebAgentBench
/

st-webagentbench-leaderboard

Sleeping

dolev31 commited on 6 days ago

Commit

b3a385c

1 Parent(s): c173223

Update ICLR year from 2025 to 2026 across all sources

Files changed (2) hide show

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ short_description: "Safety & Trustworthiness Leaderboard for Web Agents"
 # ST-WebAgentBench Leaderboard
-**Evaluating Safety & Trustworthiness in Web Agents — ICLR 2025**
 375 tasks | 3,005 policies | 6 safety dimensions | 3 web applications

 # ST-WebAgentBench Leaderboard
+**Evaluating Safety & Trustworthiness in Web Agents — ICLR 2026**
 375 tasks | 3,005 policies | 6 safety dimensions | 3 web applications

app.py CHANGED Viewed

@@ -2351,7 +2351,7 @@ def create_app() -> gr.Blocks:
             <div class="logo-row">
                 <img src="{_IBM_LOGO_B64}" alt="IBM" />
             </div>
-            <h1>ST-WebAgentBench <span class="iclr-badge">ICLR 2025</span></h1>
             <p class="subtitle">
                 Evaluating Safety &amp; Trustworthiness in Web Agents
             </p>
@@ -2899,7 +2899,7 @@ contact details.
                     f"## About ST-WebAgentBench\n\n"
                     f"**{EXPECTED_TASK_COUNT} tasks** | **{EXPECTED_POLICY_COUNT:,} policies** "
                     f"| **{len(SAFETY_DIMENSIONS)} safety dimensions** | **{len(WEB_APPLICATIONS)} web applications**\n\n"
-                    "**Accepted at ICLR 2025** — ST-WebAgentBench evaluates web agents on both "
                     "task completion **and** safety policy adherence — the first benchmark to "
                     "systematically measure the safety-performance tradeoff in autonomous web agents.\n\n"
                     "### Key Metrics\n"

             <div class="logo-row">
                 <img src="{_IBM_LOGO_B64}" alt="IBM" />
             </div>
+            <h1>ST-WebAgentBench <span class="iclr-badge">ICLR 2026</span></h1>
             <p class="subtitle">
                 Evaluating Safety &amp; Trustworthiness in Web Agents
             </p>
                     f"## About ST-WebAgentBench\n\n"
                     f"**{EXPECTED_TASK_COUNT} tasks** | **{EXPECTED_POLICY_COUNT:,} policies** "
                     f"| **{len(SAFETY_DIMENSIONS)} safety dimensions** | **{len(WEB_APPLICATIONS)} web applications**\n\n"
+                    "**Accepted at ICLR 2026** — ST-WebAgentBench evaluates web agents on both "
                     "task completion **and** safety policy adherence — the first benchmark to "
                     "systematically measure the safety-performance tradeoff in autonomous web agents.\n\n"
                     "### Key Metrics\n"