Spaces:

parslee
/

statebench-explorer

Sleeping

Upload folder using huggingface_hub

adac050 verified 2 months ago

1.39 kB

	---
	title: StateBench Explorer
	emoji: 🔍
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.9.1
	app_file: app.py
	pinned: false
	license: mit
	datasets:
	- parslee/statebench
	---

	# StateBench Explorer

	Interactive inspection tool for the [StateBench](https://huggingface.co/datasets/parslee/statebench) benchmark.

	## Features

	- Browse timelines from train/validation/test splits
	- Filter by track (13 evaluation tracks)
	- View events: conversation turns, state writes, supersessions, queries
	- Inspect ground truth: expected decisions, must mention, must not mention
	- Compare baselines: see context built by different memory strategies

	## Usage

	1. Select a split (test, validation, train)
	2. Optionally filter by track
	3. Choose a timeline ID
	4. Select a baseline to see how it builds context
	5. Click Inspect Timeline

	## Tracks

	\| Track \| Description \|
	\|-------\|-------------\|
	\| supersession \| Facts invalidated by newer information \|
	\| commitment_durability \| Commitments survive interruptions \|
	\| scope_permission \| Role-based access control \|
	\| causality \| Multi-constraint dependencies \|
	\| ... \| See dataset card for full list \|

	## Links

	- [Dataset](https://huggingface.co/datasets/parslee/statebench)
	- [GitHub](https://github.com/Parslee-ai/statebench)
	- [Evaluation Guide](https://github.com/Parslee-ai/statebench#evaluation)