statebench-explorer / README.md
mattliotta's picture
Upload folder using huggingface_hub
adac050 verified
---
title: StateBench Explorer
emoji: ๐Ÿ”
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
pinned: false
license: mit
datasets:
- parslee/statebench
---
# StateBench Explorer
Interactive inspection tool for the [StateBench](https://huggingface.co/datasets/parslee/statebench) benchmark.
## Features
- **Browse timelines** from train/validation/test splits
- **Filter by track** (13 evaluation tracks)
- **View events**: conversation turns, state writes, supersessions, queries
- **Inspect ground truth**: expected decisions, must mention, must not mention
- **Compare baselines**: see context built by different memory strategies
## Usage
1. Select a **split** (test, validation, train)
2. Optionally filter by **track**
3. Choose a **timeline ID**
4. Select a **baseline** to see how it builds context
5. Click **Inspect Timeline**
## Tracks
| Track | Description |
|-------|-------------|
| supersession | Facts invalidated by newer information |
| commitment_durability | Commitments survive interruptions |
| scope_permission | Role-based access control |
| causality | Multi-constraint dependencies |
| ... | See dataset card for full list |
## Links
- [Dataset](https://huggingface.co/datasets/parslee/statebench)
- [GitHub](https://github.com/Parslee-ai/statebench)
- [Evaluation Guide](https://github.com/Parslee-ai/statebench#evaluation)