Dataset

To train our affordance segmentation model, we use two types of data:

General Segmentation Data: This follows LISA.
Affordance Segmentation Data: This is a large-scale dataset that we collect.

General Segmentation Data

These data is organized as follows:

./data/
├── lisa_data
│   ├── ade20k
│   ├── coco
│   ├── cocostuff
│   ├── llava_dataset
│   ├── mapillary
│   ├── reason_seg
│   ├── refer_seg
│   ├── vlpart

Affordance Segmentation Data

We employ images from HANDAL, Open-X, GraspNet, EgoObjects, and RLBench in our affordance segmentation task.

The HANDAL data is downloaded and organized according to its official repo. Other data can be downloaded from the Hugging Face.

The training data is organized as follows:

./data/
├── openx_train.pkl
├── graspnet_train.pkl
├── egoobjects_train.pkl
├── rlbench_train.pkl
├── handal_hard_reasoning_train.pkl
├── egoobjects_easy_reasoning_train.pkl
├── egoobjects_hard_reasoning_train.pkl
├── HANDAL
│   ├── without_depth
│       ├── handal_dataset_adjustable_wrenches
│       ├── handal_dataset_combinational_wrenches
│       ├── handal_dataset_fixed_joint_pliers
│       ├── ...
├── openx
│   ├── images
│       ├── fractal20220817_data
│       ├── bridge
│   ├── masks
│       ├── fractal20220817_data
│       ├── bridge
├── graspnet
│   ├── images
│   ├── masks
│   ├── test_seen
│   ├── test_novel
├── egoobjects
│   ├── images
│   ├── masks
├── rlbench
│   ├── images
│   ├── masks
├── 3doi
│   ├── images
│   ├── masks

The evaluation data is also in the same dictory, but with the *_eval.pkl files instead of *_train.pkl.

./data/
├── handal_mini_val.pkl
├── graspnet_test_seen_val.pkl
├── graspnet_test_novel_val.pkl
├── 3doi_val.pkl
├── handal_easy_reasoning_val.pkl
├── handal_hard_reasoning_val.pkl
├── 3doi_easy_reasoning_val.pkl

You can use the following script to confirm if data is organized correctly:

python data_curation/check_dataset.py

About data curation

SAM2: We use SAM2 to generate affordance mask if the dataset provides box annotation.
Florence-2 + SAM2: We use Florence-2 to generate the initial segmentation masks of some complete objects, and then refine them with SAM2. Please see Florence-2+SAM2.
VLPart + SAM2: We use VLPart to generate box of object part, and then refine them with SAM2. We refer to VLPart. We provide our inference demo scripts in data_curation/build_vlpart.py and data_curation/vlpart_sam2_tracking.py.
Reasoning Instruction: We provide two example scripts to generate reasoning instructions for the affordance segmentation task:
- data_curation/prompt_generation_handal_easy_reasoning.py: This script generates easy reasoning instructions for the HANDAL dataset.
- data_curation/prompt_generation_handal_hard_reasoning.py: This script generates hard reasoning instructions for the HANDAL dataset.