Buckets:
| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| core | 775 items | ||
| lab | 74 items | ||
| .gitattributes | 168 Bytes xet | 176e87db | |
| README.md | 5.15 kB xet | 7dd257c8 | |
| _prep_summary.json | 2.41 kB xet | bcc6769f |
BOPASK-Test
Human-verified evaluation benchmark for the BOPASK spatial-reasoning VQA dataset.
Contains 934 question-answer pairs across two testsets:
core— BOPASK-Core: three BOP-Challenge families (HANDAL, HOPE, YCB-V).lab— BOPASK-Lab : an in-the-wild set of "home / lab" scenes.
Contents at a glance
| Split | Family | Records | RGB images | Depth maps | Masks |
|---|---|---|---|---|---|
| core | handal | 251 | 43 | 41 | 138 |
| core | hope | 189 | 50 | 29 | 231 |
| core | ycbv | 248 | 48 | 48 | 153 |
| lab | home | 246 | 21 | 12 (⚠) | 52 |
| Total | 934 | 162 | 130 | 574 |
Question-type distribution
| question_type / subtype | handal | hope | ycbv | home | Total |
|---|---|---|---|---|---|
| pose / 2dbbox | 39 | 39 | 38 | 39 | 155 |
| grasp / 2dplane | 40 | 40 | 40 | 38 | 158 |
| spatial_reasoning / relative_position | 40 | 40 | 40 | 71 | 191 |
| trajectory / 2d | 40 | 40 | 40 | 48 | 168 |
| depth_relative / closer | 40 | — | 40 | 16 | 96 |
| depth_relative / farther | 40 | — | 40 | 24 | 104 |
| object_rearrangement / point_wise | 12 | 30 | 10 | 10 | 62 |
| family total | 251 | 189 | 248 | 246 | 934 |
Layout
bopask-test/
├── README.md
├── core/ (BOPASK-Core testset)
│ ├── bopask-test-handal.json
│ ├── bopask-test-hope.json
│ ├── bopask-test-ycbv.json
│ ├── handal/
│ │ ├── images/ (43 *.png)
│ │ ├── depth_maps/ (41 *_depth.png)
│ │ └── masks/ (138 *_mask.png)
│ ├── hope/
│ │ └── images/ depth_maps/ masks/
│ └── ycbv/
│ └── images/ depth_maps/ masks/
└── lab/ (BOPASK-Lab testset)
├── bopask-test-home.json
└── home/
├── images/ (21 *.png)
├── depth_maps/ (empty — see caveat above)
└── masks/ (52 masks_<scene>_<object>.png)
All paths inside each JSON are relative to this dataset root, e.g.
core/handal/images/scene_000008_frame_000980.png.
Quick start
import json
from datasets import load_dataset
# Load one of the configs:
ds = load_dataset("bhatvineet/bopask-test", "core-handal", split="test")
print(ds[0])
# Or load all four families manually:
configs = ["core-handal", "core-hope", "core-ycbv", "lab-home"]
for cfg in configs:
d = load_dataset("bhatvineet/bopask-test", cfg, split="test")
print(cfg, len(d))
Loading directly without datasets:
import json
with open("core/bopask-test-handal.json") as f:
records = json.load(f)
for r in records:
img_path = r["images"][0] # e.g. "core/handal/images/..."
user_q = r["messages"][0]["content"]
gt_answer = r["messages"][1]["content"]
Evaluation protocols
Each record is a single-turn VQA pair with one ground-truth response in
messages[1].content. Answer formats are self-describing — the user prompt
tells the model the expected output format (e.g. "respond as a list of 2D
points…"). Common metrics by type:
| question_type | typical metric |
|---|---|
| pose / 2dbbox | 2D IoU |
| grasp / 2dplane | endpoint L2 / success@τ |
| trajectory / 2d | trajectory-wise DTW, endpoint error |
| spatial_reasoning / relative_position | exact match (yes/no) |
| depth_relative | exact match (closer/farther) |
| object_rearrangement / point_wise | point-in-mask accuracy |
Relationship to the training set
This benchmark was curated and human-verified to be disjoint from the
bhatvineet/bopask-train
training split. Use this for evaluation only.
Citation
If you use this dataset, please cite the BOPASK paper and the underlying BOP-Challenge object-pose datasets (HANDAL, HOPE, LineMOD, YCB-V).
License
MIT for the QA annotations. The underlying RGB / depth / mask assets inherit the licenses of their source BOP-Challenge datasets (HANDAL, HOPE, YCB-V) and the bopask-home captures.
- Total size
- 399 MB
- Files
- 852
- Last updated
- May 29
- Pre-warmed CDN
- US EU US EU