ProCap / README.md

BlueberryOreo

Update README.md

104459b verified 23 days ago

preview code

raw

history blame contribute delete

3.54 kB

metadata

license: mit
tags:
  - change captioning
  - vision-language
  - image-to-text
  - procedural reasoning
  - multimodal
  - pytorch
datasets:
  - clevr-change
  - image-editing-request
  - spot-the-diff
metrics:
  - bleu
  - meteor
  - rouge
pipeline_tag: image-to-text

ProCap: Experiment Materials

This repository contains the official experimental materials for the paper:

Imagine How to Change: Explicit Procedure Modeling for Change Captioning

It provides processed datasets, pre-trained model weights, and evaluation tools for reproducing the results reported in the paper.

📦 All materials are also available via Baidu Netdisk Extraction Code: 5h7w

Data
Model Weights
Evaluation
Usage
License

Data

All datasets are preprocessed into pseudo-sequence format (.h5 files) generated by VFIformer.

Included Datasets

CLEVR-data
Processed pseudo-sequences for the CLEVR-Change dataset
edit-data
Processed pseudo-sequences for the Image-Editing-Request dataset
spot-data
Processed pseudo-sequences for the Spot-the-Diff dataset
filter_files
Confidence scores computed using CLIP4IDC
filtered-spot-captions
Refined captions for the Spot-the-Diff dataset

Model Weights

This repository provides pre-trained weights for both stages in the paper.

Explicit Procedure Modeling (Stage 1)

pretrained_vqgan – VQGAN models for each dataset
stage1_clevr_best
stage1_edit_best
stage1_spot_best

Implicit Procedure Captioning (Stage 2)

clevr_best
edit_best
spot_best

Note: Stage 1 checkpoints can be directly reused to initialize Stage 2 training.

Evaluation

densevid_eval
Evaluation tools used for quantitative assessment

Usage

1. Data Preparation

Move caption files in filtered-spot-captions to the original caption directory of the Spot-the-Diff dataset.
Copy the processed data folders to the original dataset root and rename them as follows:

Dataset	Folder	Rename To
CLEVR-Change	`CLEVR-data`	`CLEVR_processed`
Image-Editing-Request	`edit-data`	`edit_processed`
Spot-the-Diff	`spot-data`	`spot_processed`

Place filter_files in the project root directory.

2. Model Weights

Place pretrained_vqgan in the project root directory.
To reuse Stage 1 weights during training, set symlink_path in training scripts as:

symlink_path="/path/to/stage1/weight/dalle.pt"

To evaluate with pre-trained checkpoints, set resume_path in evaluation scripts as:

resume_path="/path/to/pretrained/model/model.chkpt"

3. Evaluation Tool

Place the densevid_eval directory in the project root before evaluation.

Citation

If you find our work or this repository useful, please consider citing our paper:

@inproceedings{
  sun2026imagine,
  title={Imagine How To Change: Explicit Procedure Modeling for Change Captioning},
  author={Sun, Jiayang and Guo, Zixin and Cao, Min and Zhu, Guibo and Laaksonen, Jorma},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
}

License

This repository is released under the MIT License.

BlueberryOreo
/

ProCap