BlueberryOreo commited on
Commit
e57de04
·
verified ·
1 Parent(s): 2b2ae2b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +148 -3
README.md CHANGED
@@ -1,3 +1,148 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - change captioning
5
+ - vision-language
6
+ - image-to-text
7
+ - procedural reasoning
8
+ - multimodal
9
+ - pytorch
10
+ datasets:
11
+ - clevr-change
12
+ - image-editing-request
13
+ - spot-the-diff
14
+ metrics:
15
+ - bleu
16
+ - meteor
17
+ - rouge
18
+ pipeline_tag: image-to-text
19
+ ---
20
+
21
+ # ProCap: Experiment Materials
22
+
23
+ This repository contains the **official experimental materials** for the paper:
24
+
25
+ > **Imagine How to Change: Explicit Procedure Modeling for Change Captioning**
26
+
27
+ It provides **processed datasets**, **pre-trained model weights**, and **evaluation tools** for reproducing the results reported in the paper.
28
+
29
+ 📦 All materials are also available via [Baidu Netdisk](https://pan.baidu.com/s/1t_YXB6J_vkuPxByn2hat2A)
30
+ **Extraction Code:** `5h7w`
31
+
32
+ ---
33
+
34
+ ## Contents
35
+
36
+ - [Data](#data)
37
+ - [Model Weights](#model-weights)
38
+ - [Evaluation](#evaluation)
39
+ - [Usage](#usage)
40
+ - [License](#license)
41
+
42
+ ---
43
+
44
+ ## Data
45
+
46
+ All datasets are preprocessed into **pseudo-sequence format** (`.h5` files).
47
+
48
+ ### Included Datasets
49
+
50
+ - **`CLEVR-data`**
51
+ Processed pseudo-sequences for the **CLEVR-Change** dataset
52
+
53
+ - **`edit-data`**
54
+ Processed pseudo-sequences for the **Image-Editing-Request** dataset
55
+
56
+ - **`spot-data`**
57
+ Processed pseudo-sequences for the **Spot-the-Diff** dataset
58
+
59
+ - **`filter_files`**
60
+ Confidence scores computed using [CLIP4IDC](https://github.com/sushizixin/CLIP4IDC)
61
+
62
+ - **`filtered-spot-captions`**
63
+ Refined captions for the Spot-the-Diff dataset
64
+
65
+ ---
66
+
67
+ ## Model Weights
68
+
69
+ This repository provides pre-trained weights for both stages in the paper.
70
+
71
+ ### Explicit Procedure Modeling (Stage 1)
72
+
73
+ - `pretrained_vqgan` – VQGAN models for each dataset
74
+ - `stage1_clevr_best`
75
+ - `stage1_edit_best`
76
+ - `stage1_spot_best`
77
+
78
+ ### Implicit Procedure Captioning (Stage 2)
79
+
80
+ - `clevr_best`
81
+ - `edit_best`
82
+ - `spot_best`
83
+
84
+ > **Note:** Stage 1 checkpoints can be directly reused to initialize Stage 2 training.
85
+
86
+ ---
87
+
88
+ ## Evaluation
89
+
90
+ - **`densevid_eval`**
91
+ Evaluation tools used for quantitative assessment
92
+
93
+ ---
94
+
95
+ ## Usage
96
+
97
+ ### 1. Data Preparation
98
+
99
+ 1. Move caption files in `filtered-spot-captions` to the original caption directory of the **Spot-the-Diff** dataset.
100
+ 2. Copy the processed data folders to the original dataset root and rename them as follows:
101
+
102
+ | Dataset | Folder | Rename To |
103
+ |------|------|------|
104
+ | CLEVR-Change | `CLEVR-data` | `CLEVR_processed` |
105
+ | Image-Editing-Request | `edit-data` | `edit_processed` |
106
+ | Spot-the-Diff | `spot-data` | `spot_processed` |
107
+
108
+ 3. Place `filter_files` in the project root directory.
109
+
110
+ ---
111
+
112
+ ### 2. Model Weights
113
+
114
+ - Place `pretrained_vqgan` in the project root directory.
115
+ - To reuse Stage 1 weights during training, set `symlink_path` in training scripts as:
116
+
117
+ ```bash
118
+ symlink_path="/path/to/stage1/weight/dalle.pt"
119
+ ```
120
+
121
+ - To evaluate with pre-trained checkpoints, set `resume_path` in evaluation scripts as:
122
+
123
+ ```bash
124
+ resume_path="/path/to/pretrained/model/model.chkpt"
125
+ ```
126
+
127
+ ### 3. Evaluation Tool
128
+
129
+ Place the `densevid_eval` directory in the project root before evaluation.
130
+
131
+ ## Citation
132
+
133
+ If you find our work or this repository useful, please consider citing our paper:
134
+ ```bibtex
135
+ @inproceedings{
136
+ sun2026imagine,
137
+ title={Imagine How To Change: Explicit Procedure Modeling for Change Captioning},
138
+ author={Sun, Jiayang and Guo, Zixin and Cao, Min and Zhu, Guibo and Laaksonen, Jorma},
139
+ booktitle={The Fourteenth International Conference on Learning Representations},
140
+ year={2026},
141
+ }
142
+ ```
143
+
144
+ ---
145
+
146
+ ## License
147
+
148
+ This repository is released under the MIT License.