File size: 1,391 Bytes
f820383
966ab76
 
 
 
f820383
adac050
f820383
 
966ab76
 
 
f820383
 
966ab76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
title: StateBench Explorer
emoji: 🔍
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
pinned: false
license: mit
datasets:
  - parslee/statebench
---

# StateBench Explorer

Interactive inspection tool for the [StateBench](https://huggingface.co/datasets/parslee/statebench) benchmark.

## Features

- **Browse timelines** from train/validation/test splits
- **Filter by track** (13 evaluation tracks)
- **View events**: conversation turns, state writes, supersessions, queries
- **Inspect ground truth**: expected decisions, must mention, must not mention
- **Compare baselines**: see context built by different memory strategies

## Usage

1. Select a **split** (test, validation, train)
2. Optionally filter by **track**
3. Choose a **timeline ID**
4. Select a **baseline** to see how it builds context
5. Click **Inspect Timeline**

## Tracks

| Track | Description |
|-------|-------------|
| supersession | Facts invalidated by newer information |
| commitment_durability | Commitments survive interruptions |
| scope_permission | Role-based access control |
| causality | Multi-constraint dependencies |
| ... | See dataset card for full list |

## Links

- [Dataset](https://huggingface.co/datasets/parslee/statebench)
- [GitHub](https://github.com/Parslee-ai/statebench)
- [Evaluation Guide](https://github.com/Parslee-ai/statebench#evaluation)