--- title: StateBench Explorer emoji: 🔍 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.9.1 app_file: app.py pinned: false license: mit datasets: - parslee/statebench --- # StateBench Explorer Interactive inspection tool for the [StateBench](https://huggingface.co/datasets/parslee/statebench) benchmark. ## Features - **Browse timelines** from train/validation/test splits - **Filter by track** (13 evaluation tracks) - **View events**: conversation turns, state writes, supersessions, queries - **Inspect ground truth**: expected decisions, must mention, must not mention - **Compare baselines**: see context built by different memory strategies ## Usage 1. Select a **split** (test, validation, train) 2. Optionally filter by **track** 3. Choose a **timeline ID** 4. Select a **baseline** to see how it builds context 5. Click **Inspect Timeline** ## Tracks | Track | Description | |-------|-------------| | supersession | Facts invalidated by newer information | | commitment_durability | Commitments survive interruptions | | scope_permission | Role-based access control | | causality | Multi-constraint dependencies | | ... | See dataset card for full list | ## Links - [Dataset](https://huggingface.co/datasets/parslee/statebench) - [GitHub](https://github.com/Parslee-ai/statebench) - [Evaluation Guide](https://github.com/Parslee-ai/statebench#evaluation)