dolev31 commited on
Commit
b3a385c
·
1 Parent(s): c173223

Update ICLR year from 2025 to 2026 across all sources

Browse files
Files changed (2) hide show
  1. README.md +1 -1
  2. app.py +2 -2
README.md CHANGED
@@ -21,7 +21,7 @@ short_description: "Safety & Trustworthiness Leaderboard for Web Agents"
21
 
22
  # ST-WebAgentBench Leaderboard
23
 
24
- **Evaluating Safety & Trustworthiness in Web Agents — ICLR 2025**
25
 
26
  375 tasks | 3,005 policies | 6 safety dimensions | 3 web applications
27
 
 
21
 
22
  # ST-WebAgentBench Leaderboard
23
 
24
+ **Evaluating Safety & Trustworthiness in Web Agents — ICLR 2026**
25
 
26
  375 tasks | 3,005 policies | 6 safety dimensions | 3 web applications
27
 
app.py CHANGED
@@ -2351,7 +2351,7 @@ def create_app() -> gr.Blocks:
2351
  <div class="logo-row">
2352
  <img src="{_IBM_LOGO_B64}" alt="IBM" />
2353
  </div>
2354
- <h1>ST-WebAgentBench <span class="iclr-badge">ICLR 2025</span></h1>
2355
  <p class="subtitle">
2356
  Evaluating Safety &amp; Trustworthiness in Web Agents
2357
  </p>
@@ -2899,7 +2899,7 @@ contact details.
2899
  f"## About ST-WebAgentBench\n\n"
2900
  f"**{EXPECTED_TASK_COUNT} tasks** | **{EXPECTED_POLICY_COUNT:,} policies** "
2901
  f"| **{len(SAFETY_DIMENSIONS)} safety dimensions** | **{len(WEB_APPLICATIONS)} web applications**\n\n"
2902
- "**Accepted at ICLR 2025** — ST-WebAgentBench evaluates web agents on both "
2903
  "task completion **and** safety policy adherence — the first benchmark to "
2904
  "systematically measure the safety-performance tradeoff in autonomous web agents.\n\n"
2905
  "### Key Metrics\n"
 
2351
  <div class="logo-row">
2352
  <img src="{_IBM_LOGO_B64}" alt="IBM" />
2353
  </div>
2354
+ <h1>ST-WebAgentBench <span class="iclr-badge">ICLR 2026</span></h1>
2355
  <p class="subtitle">
2356
  Evaluating Safety &amp; Trustworthiness in Web Agents
2357
  </p>
 
2899
  f"## About ST-WebAgentBench\n\n"
2900
  f"**{EXPECTED_TASK_COUNT} tasks** | **{EXPECTED_POLICY_COUNT:,} policies** "
2901
  f"| **{len(SAFETY_DIMENSIONS)} safety dimensions** | **{len(WEB_APPLICATIONS)} web applications**\n\n"
2902
+ "**Accepted at ICLR 2026** — ST-WebAgentBench evaluates web agents on both "
2903
  "task completion **and** safety policy adherence — the first benchmark to "
2904
  "systematically measure the safety-performance tradeoff in autonomous web agents.\n\n"
2905
  "### Key Metrics\n"