Spaces:

suanlab
/

structviz-bench-human-eval

Sleeping

App Files Files Community

structviz-bench-human-eval / README.md

suanlab

fix Space python version metadata

991b693 2 months ago

preview code

raw

history blame contribute delete

1.67 kB

A newer version of the Gradio SDK is available: 6.15.1

Upgrade

metadata

title: StructViz-Bench Human Eval
emoji: 📊
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 4.44.1
python_version: 3.10.13
app_file: app.py
pinned: false

StructViz-Bench Human Evaluation Space

This Hugging Face Space hosts the human evaluation workflow for StructViz-Bench.

What It Supports

Task A: answer correctness verification on 100 stratified items
Task B: visualization-sensitivity plausibility on 50 paired items

Data Files Required

Place these files in data/ before pushing the Space:

task_a_items.jsonl
task_b_pairs.jsonl

Place the referenced images under either:

images/<safe_filename>.png
or benchmark/rendered/benchmark/rendered/<modality>/<question_id>_<viz_type>.png

Response Storage Format

Responses are written to both JSONL and CSV under responses/:

responses/task_a_responses.jsonl
responses/task_a_responses.csv
responses/task_b_responses.jsonl
responses/task_b_responses.csv

Each record contains:

timestamp
session_id
evaluator
task
item_index
question_id
task metadata (modality, difficulty, source, viz_type or viz_a/viz_b)
rating
notes

Deployment

From the project root, build a minimal Space bundle with:

python3 scripts/export_human_eval_space.py

Then push release/huggingface/human_eval_space/ to a new Hugging Face Space.

If you want responses to persist across restarts, enable Hugging Face persistent storage and keep the responses/ directory mounted there.

Note: this repo can be pushed without images first; add PNG assets later using Hugging Face Xet/LFS-compatible storage.