Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.9.0
metadata
title: StateBench Explorer
emoji: 🔍
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
pinned: false
license: mit
datasets:
- parslee/statebench
StateBench Explorer
Interactive inspection tool for the StateBench benchmark.
Features
- Browse timelines from train/validation/test splits
- Filter by track (13 evaluation tracks)
- View events: conversation turns, state writes, supersessions, queries
- Inspect ground truth: expected decisions, must mention, must not mention
- Compare baselines: see context built by different memory strategies
Usage
- Select a split (test, validation, train)
- Optionally filter by track
- Choose a timeline ID
- Select a baseline to see how it builds context
- Click Inspect Timeline
Tracks
| Track | Description |
|---|---|
| supersession | Facts invalidated by newer information |
| commitment_durability | Commitments survive interruptions |
| scope_permission | Role-based access control |
| causality | Multi-constraint dependencies |
| ... | See dataset card for full list |