399 MB
852 files
Updated 29 days ago
Name
Size
core
lab
.gitattributes168 Bytes
xet
README.md5.15 kB
xet
_prep_summary.json2.41 kB
xet
README.md

BOPASK-Test

Human-verified evaluation benchmark for the BOPASK spatial-reasoning VQA dataset.

Contains 934 question-answer pairs across two testsets:

  • core — BOPASK-Core: three BOP-Challenge families (HANDAL, HOPE, YCB-V).
  • lab — BOPASK-Lab : an in-the-wild set of "home / lab" scenes.

Contents at a glance

Split Family Records RGB images Depth maps Masks
core handal 251 43 41 138
core hope 189 50 29 231
core ycbv 248 48 48 153
lab home 246 21 12 (⚠) 52
Total 934 162 130 574

Question-type distribution

question_type / subtype handal hope ycbv home Total
pose / 2dbbox 39 39 38 39 155
grasp / 2dplane 40 40 40 38 158
spatial_reasoning / relative_position 40 40 40 71 191
trajectory / 2d 40 40 40 48 168
depth_relative / closer 40 40 16 96
depth_relative / farther 40 40 24 104
object_rearrangement / point_wise 12 30 10 10 62
family total 251 189 248 246 934

Layout

bopask-test/
├── README.md
├── core/                           (BOPASK-Core testset)
│   ├── bopask-test-handal.json
│   ├── bopask-test-hope.json
│   ├── bopask-test-ycbv.json
│   ├── handal/
│   │   ├── images/                 (43 *.png)
│   │   ├── depth_maps/             (41 *_depth.png)
│   │   └── masks/                  (138 *_mask.png)
│   ├── hope/
│   │   └── images/  depth_maps/  masks/
│   └── ycbv/
│       └── images/  depth_maps/  masks/
└── lab/                            (BOPASK-Lab testset)
    ├── bopask-test-home.json
    └── home/
        ├── images/                 (21 *.png)
        ├── depth_maps/             (empty — see caveat above)
        └── masks/                  (52 masks_<scene>_<object>.png)

All paths inside each JSON are relative to this dataset root, e.g. core/handal/images/scene_000008_frame_000980.png.

Quick start

import json
from datasets import load_dataset

# Load one of the configs:
ds = load_dataset("bhatvineet/bopask-test", "core-handal", split="test")
print(ds[0])

# Or load all four families manually:
configs = ["core-handal", "core-hope", "core-ycbv", "lab-home"]
for cfg in configs:
    d = load_dataset("bhatvineet/bopask-test", cfg, split="test")
    print(cfg, len(d))

Loading directly without datasets:

import json
with open("core/bopask-test-handal.json") as f:
    records = json.load(f)

for r in records:
    img_path  = r["images"][0]          # e.g. "core/handal/images/..."
    user_q    = r["messages"][0]["content"]
    gt_answer = r["messages"][1]["content"]

Evaluation protocols

Each record is a single-turn VQA pair with one ground-truth response in messages[1].content. Answer formats are self-describing — the user prompt tells the model the expected output format (e.g. "respond as a list of 2D points…"). Common metrics by type:

question_type typical metric
pose / 2dbbox 2D IoU
grasp / 2dplane endpoint L2 / success@τ
trajectory / 2d trajectory-wise DTW, endpoint error
spatial_reasoning / relative_position exact match (yes/no)
depth_relative exact match (closer/farther)
object_rearrangement / point_wise point-in-mask accuracy

Relationship to the training set

This benchmark was curated and human-verified to be disjoint from the bhatvineet/bopask-train training split. Use this for evaluation only.

Citation

If you use this dataset, please cite the BOPASK paper and the underlying BOP-Challenge object-pose datasets (HANDAL, HOPE, LineMOD, YCB-V).

License

MIT for the QA annotations. The underlying RGB / depth / mask assets inherit the licenses of their source BOP-Challenge datasets (HANDAL, HOPE, YCB-V) and the bopask-home captures.

Total size
399 MB
Files
852
Last updated
May 29
Pre-warmed CDN
US EU US EU

Contributors