Buckets:

hydang
/

bopask-test-bucket

399 MB

852 files

Updated 29 days ago

Ctrl+K

Name	Size	Uploaded	Xet hash
core		29 days ago	775 items
lab		29 days ago	74 items
.gitattributes	168 Bytes xet	29 days ago	176e87db
README.md	5.15 kB xet	29 days ago	7dd257c8
_prep_summary.json	2.41 kB xet	29 days ago	bcc6769f

README.md

BOPASK-Test

Human-verified evaluation benchmark for the BOPASK spatial-reasoning VQA dataset.

Contains 934 question-answer pairs across two testsets:

core — BOPASK-Core: three BOP-Challenge families (HANDAL, HOPE, YCB-V).
lab — BOPASK-Lab : an in-the-wild set of "home / lab" scenes.

Contents at a glance

Split	Family	Records	RGB images	Depth maps	Masks
core	handal	251	43	41	138
core	hope	189	50	29	231
core	ycbv	248	48	48	153
lab	home	246	21	12 (⚠)	52
Total		934	162	130	574

Question-type distribution

question_type / subtype	handal	hope	ycbv	home	Total
pose / 2dbbox	39	39	38	39	155
grasp / 2dplane	40	40	40	38	158
spatial_reasoning / relative_position	40	40	40	71	191
trajectory / 2d	40	40	40	48	168
depth_relative / closer	40	—	40	16	96
depth_relative / farther	40	—	40	24	104
object_rearrangement / point_wise	12	30	10	10	62
family total	251	189	248	246	934

Layout

bopask-test/
├── README.md
├── core/                           (BOPASK-Core testset)
│   ├── bopask-test-handal.json
│   ├── bopask-test-hope.json
│   ├── bopask-test-ycbv.json
│   ├── handal/
│   │   ├── images/                 (43 *.png)
│   │   ├── depth_maps/             (41 *_depth.png)
│   │   └── masks/                  (138 *_mask.png)
│   ├── hope/
│   │   └── images/  depth_maps/  masks/
│   └── ycbv/
│       └── images/  depth_maps/  masks/
└── lab/                            (BOPASK-Lab testset)
    ├── bopask-test-home.json
    └── home/
        ├── images/                 (21 *.png)
        ├── depth_maps/             (empty — see caveat above)
        └── masks/                  (52 masks_<scene>_<object>.png)

All paths inside each JSON are relative to this dataset root, e.g. core/handal/images/scene_000008_frame_000980.png.

Quick start

import json
from datasets import load_dataset

# Load one of the configs:
ds = load_dataset("bhatvineet/bopask-test", "core-handal", split="test")
print(ds[0])

# Or load all four families manually:
configs = ["core-handal", "core-hope", "core-ycbv", "lab-home"]
for cfg in configs:
    d = load_dataset("bhatvineet/bopask-test", cfg, split="test")
    print(cfg, len(d))

Loading directly without datasets:

import json
with open("core/bopask-test-handal.json") as f:
    records = json.load(f)

for r in records:
    img_path  = r["images"][0]          # e.g. "core/handal/images/..."
    user_q    = r["messages"][0]["content"]
    gt_answer = r["messages"][1]["content"]

Evaluation protocols

Each record is a single-turn VQA pair with one ground-truth response in messages[1].content. Answer formats are self-describing — the user prompt tells the model the expected output format (e.g. "respond as a list of 2D points…"). Common metrics by type:

question_type	typical metric
pose / 2dbbox	2D IoU
grasp / 2dplane	endpoint L2 / success@τ
trajectory / 2d	trajectory-wise DTW, endpoint error
spatial_reasoning / relative_position	exact match (yes/no)
depth_relative	exact match (closer/farther)
object_rearrangement / point_wise	point-in-mask accuracy

Relationship to the training set

This benchmark was curated and human-verified to be disjoint from the bhatvineet/bopask-train training split. Use this for evaluation only.

Citation

If you use this dataset, please cite the BOPASK paper and the underlying BOP-Challenge object-pose datasets (HANDAL, HOPE, LineMOD, YCB-V).

License

MIT for the QA annotations. The underlying RGB / depth / mask assets inherit the licenses of their source BOP-Challenge datasets (HANDAL, HOPE, YCB-V) and the bopask-home captures.

Total size: 399 MB

Files: 852

Last updated: May 29

Pre-warmed CDN: US EU US EU