Update ICLR year from 2025 to 2026 across all sources
Browse files
README.md
CHANGED
|
@@ -21,7 +21,7 @@ short_description: "Safety & Trustworthiness Leaderboard for Web Agents"
|
|
| 21 |
|
| 22 |
# ST-WebAgentBench Leaderboard
|
| 23 |
|
| 24 |
-
**Evaluating Safety & Trustworthiness in Web Agents — ICLR
|
| 25 |
|
| 26 |
375 tasks | 3,005 policies | 6 safety dimensions | 3 web applications
|
| 27 |
|
|
|
|
| 21 |
|
| 22 |
# ST-WebAgentBench Leaderboard
|
| 23 |
|
| 24 |
+
**Evaluating Safety & Trustworthiness in Web Agents — ICLR 2026**
|
| 25 |
|
| 26 |
375 tasks | 3,005 policies | 6 safety dimensions | 3 web applications
|
| 27 |
|
app.py
CHANGED
|
@@ -2351,7 +2351,7 @@ def create_app() -> gr.Blocks:
|
|
| 2351 |
<div class="logo-row">
|
| 2352 |
<img src="{_IBM_LOGO_B64}" alt="IBM" />
|
| 2353 |
</div>
|
| 2354 |
-
<h1>ST-WebAgentBench <span class="iclr-badge">ICLR
|
| 2355 |
<p class="subtitle">
|
| 2356 |
Evaluating Safety & Trustworthiness in Web Agents
|
| 2357 |
</p>
|
|
@@ -2899,7 +2899,7 @@ contact details.
|
|
| 2899 |
f"## About ST-WebAgentBench\n\n"
|
| 2900 |
f"**{EXPECTED_TASK_COUNT} tasks** | **{EXPECTED_POLICY_COUNT:,} policies** "
|
| 2901 |
f"| **{len(SAFETY_DIMENSIONS)} safety dimensions** | **{len(WEB_APPLICATIONS)} web applications**\n\n"
|
| 2902 |
-
"**Accepted at ICLR
|
| 2903 |
"task completion **and** safety policy adherence — the first benchmark to "
|
| 2904 |
"systematically measure the safety-performance tradeoff in autonomous web agents.\n\n"
|
| 2905 |
"### Key Metrics\n"
|
|
|
|
| 2351 |
<div class="logo-row">
|
| 2352 |
<img src="{_IBM_LOGO_B64}" alt="IBM" />
|
| 2353 |
</div>
|
| 2354 |
+
<h1>ST-WebAgentBench <span class="iclr-badge">ICLR 2026</span></h1>
|
| 2355 |
<p class="subtitle">
|
| 2356 |
Evaluating Safety & Trustworthiness in Web Agents
|
| 2357 |
</p>
|
|
|
|
| 2899 |
f"## About ST-WebAgentBench\n\n"
|
| 2900 |
f"**{EXPECTED_TASK_COUNT} tasks** | **{EXPECTED_POLICY_COUNT:,} policies** "
|
| 2901 |
f"| **{len(SAFETY_DIMENSIONS)} safety dimensions** | **{len(WEB_APPLICATIONS)} web applications**\n\n"
|
| 2902 |
+
"**Accepted at ICLR 2026** — ST-WebAgentBench evaluates web agents on both "
|
| 2903 |
"task completion **and** safety policy adherence — the first benchmark to "
|
| 2904 |
"systematically measure the safety-performance tradeoff in autonomous web agents.\n\n"
|
| 2905 |
"### Key Metrics\n"
|