File size: 3,512 Bytes
0453c63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
## Dataset

To train our affordance segmentation model, we use two types of data:
* **General Segmentation Data**: This follows [LISA](https://github.com/dvlab-research/LISA).
* **Affordance Segmentation Data**: This is a large-scale dataset that we collect.

### General Segmentation Data
These data is organized as follows:
```
./data/
β”œβ”€β”€ lisa_data
β”‚   β”œβ”€β”€ ade20k
β”‚   β”œβ”€β”€ coco
β”‚   β”œβ”€β”€ cocostuff
β”‚   β”œβ”€β”€ llava_dataset
β”‚   β”œβ”€β”€ mapillary
β”‚   β”œβ”€β”€ reason_seg
β”‚   β”œβ”€β”€ refer_seg
β”‚   β”œβ”€β”€ vlpart
```

### Affordance Segmentation Data

We employ images from HANDAL, Open-X, GraspNet, EgoObjects, and RLBench in our affordance segmentation task. 

The HANDAL data is downloaded and organized according to its official [repo](https://github.com/NVlabs/HANDAL).
Other data can be downloaded from the [Hugging Face](https://huggingface.co/datasets/Dongming97/RAGNet).

The training data is organized as follows:
```
./data/
β”œβ”€β”€ openx_train.pkl
β”œβ”€β”€ graspnet_train.pkl
β”œβ”€β”€ egoobjects_train.pkl
β”œβ”€β”€ rlbench_train.pkl
β”œβ”€β”€ handal_hard_reasoning_train.pkl
β”œβ”€β”€ egoobjects_easy_reasoning_train.pkl
β”œβ”€β”€ egoobjects_hard_reasoning_train.pkl
β”œβ”€β”€ HANDAL
β”‚   β”œβ”€β”€ without_depth
β”‚       β”œβ”€β”€ handal_dataset_adjustable_wrenches
β”‚       β”œβ”€β”€ handal_dataset_combinational_wrenches
β”‚       β”œβ”€β”€ handal_dataset_fixed_joint_pliers
β”‚       β”œβ”€β”€ ...
β”œβ”€β”€ openx
β”‚   β”œβ”€β”€ images
β”‚       β”œβ”€β”€ fractal20220817_data
β”‚       β”œβ”€β”€ bridge
β”‚   β”œβ”€β”€ masks
β”‚       β”œβ”€β”€ fractal20220817_data
β”‚       β”œβ”€β”€ bridge
β”œβ”€β”€ graspnet
β”‚   β”œβ”€β”€ images
β”‚   β”œβ”€β”€ masks
β”‚   β”œβ”€β”€ test_seen
β”‚   β”œβ”€β”€ test_novel
β”œβ”€β”€ egoobjects
β”‚   β”œβ”€β”€ images
β”‚   β”œβ”€β”€ masks
β”œβ”€β”€ rlbench
β”‚   β”œβ”€β”€ images
β”‚   β”œβ”€β”€ masks
β”œβ”€β”€ 3doi
β”‚   β”œβ”€β”€ images
β”‚   β”œβ”€β”€ masks
```

The evaluation data is also in the same dictory, but with the `*_eval.pkl` files instead of `*_train.pkl`.

```
./data/
β”œβ”€β”€ handal_mini_val.pkl
β”œβ”€β”€ graspnet_test_seen_val.pkl
β”œβ”€β”€ graspnet_test_novel_val.pkl
β”œβ”€β”€ 3doi_val.pkl
β”œβ”€β”€ handal_easy_reasoning_val.pkl
β”œβ”€β”€ handal_hard_reasoning_val.pkl
β”œβ”€β”€ 3doi_easy_reasoning_val.pkl
```

You can use the following script to confirm if data is organized correctly:
```bash
python data_curation/check_dataset.py
```

### About data curation
1. **SAM2**: We use SAM2 to generate affordance mask if the dataset provides box annotation.
2. **Florence-2 + SAM2**: We use Florence-2 to generate the initial segmentation masks of some complete objects, and then refine them with SAM2. Please see [Florence-2+SAM2](https://github.com/IDEA-Research/Grounded-SAM-2).
3. **VLPart + SAM2**: We use VLPart to generate box of object part, and then refine them with SAM2. We refer to [VLPart](https://github.com/facebookresearch/VLPart). 
We provide our inference demo scripts in `data_curation/build_vlpart.py` and `data_curation/vlpart_sam2_tracking.py`.
4. **Reasoning Instruction**: We provide two example scripts to generate reasoning instructions for the affordance segmentation task:
   - `data_curation/prompt_generation_handal_easy_reasoning.py`: This script generates easy reasoning instructions for the HANDAL dataset.
   - `data_curation/prompt_generation_handal_hard_reasoning.py`: This script generates hard reasoning instructions for the HANDAL dataset.