ProCap / README.md
BlueberryOreo's picture
Update README.md
104459b verified
metadata
license: mit
tags:
  - change captioning
  - vision-language
  - image-to-text
  - procedural reasoning
  - multimodal
  - pytorch
datasets:
  - clevr-change
  - image-editing-request
  - spot-the-diff
metrics:
  - bleu
  - meteor
  - rouge
pipeline_tag: image-to-text

ProCap: Experiment Materials

This repository contains the official experimental materials for the paper:

Imagine How to Change: Explicit Procedure Modeling for Change Captioning

It provides processed datasets, pre-trained model weights, and evaluation tools for reproducing the results reported in the paper.

📦 All materials are also available via Baidu Netdisk Extraction Code: 5h7w


Contents


Data

All datasets are preprocessed into pseudo-sequence format (.h5 files) generated by VFIformer.

Included Datasets

  • CLEVR-data
    Processed pseudo-sequences for the CLEVR-Change dataset

  • edit-data
    Processed pseudo-sequences for the Image-Editing-Request dataset

  • spot-data
    Processed pseudo-sequences for the Spot-the-Diff dataset

  • filter_files
    Confidence scores computed using CLIP4IDC

  • filtered-spot-captions
    Refined captions for the Spot-the-Diff dataset


Model Weights

This repository provides pre-trained weights for both stages in the paper.

Explicit Procedure Modeling (Stage 1)

  • pretrained_vqgan – VQGAN models for each dataset
  • stage1_clevr_best
  • stage1_edit_best
  • stage1_spot_best

Implicit Procedure Captioning (Stage 2)

  • clevr_best
  • edit_best
  • spot_best

Note: Stage 1 checkpoints can be directly reused to initialize Stage 2 training.


Evaluation

  • densevid_eval
    Evaluation tools used for quantitative assessment

Usage

1. Data Preparation

  1. Move caption files in filtered-spot-captions to the original caption directory of the Spot-the-Diff dataset.
  2. Copy the processed data folders to the original dataset root and rename them as follows:
Dataset Folder Rename To
CLEVR-Change CLEVR-data CLEVR_processed
Image-Editing-Request edit-data edit_processed
Spot-the-Diff spot-data spot_processed
  1. Place filter_files in the project root directory.

2. Model Weights

  • Place pretrained_vqgan in the project root directory.
  • To reuse Stage 1 weights during training, set symlink_path in training scripts as:
symlink_path="/path/to/stage1/weight/dalle.pt"
  • To evaluate with pre-trained checkpoints, set resume_path in evaluation scripts as:
resume_path="/path/to/pretrained/model/model.chkpt"

3. Evaluation Tool

Place the densevid_eval directory in the project root before evaluation.

Citation

If you find our work or this repository useful, please consider citing our paper:

@inproceedings{
  sun2026imagine,
  title={Imagine How To Change: Explicit Procedure Modeling for Change Captioning},
  author={Sun, Jiayang and Guo, Zixin and Cao, Min and Zhu, Guibo and Laaksonen, Jorma},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
}

License

This repository is released under the MIT License.