ProCrop: Learning Aesthetic Image Cropping from Professional Compositions

This is the headline supervised checkpoint for the AAAI 2026 paper "ProCrop: Learning Aesthetic Image Cropping from Professional Compositions" by Zhang et al.

Model Description

ProCrop is a retrieval-augmented framework for aesthetic image cropping that leverages professional photography compositions as guidance. Given a query image, ProCrop:

Retrieves compositionally similar professional images from a large database (AVA / CGL) using SAM embeddings and Faiss nearest-neighbor search.
Fuses retrieved features with the query via cross-attention.
Predicts diverse crop proposals ranked by aesthetic score using a Conditional DETR decoder.

Reported Performance (FLMS supervised setting)

Metric	Value
IoU	0.843
BDE (Displacement)	0.036

This checkpoint matches the FLMS row of Table 3 in the paper.

Checkpoint Details

Property	Value
File	`procrop_flms_supervised.pth`
Size	512 MB
Original filename	`checkpoint0008200.8425250053405762.pth`
Trainable params	~44.8M
Backbone	ResNet-50 (DC5) + Transformer encoder/decoder
Training data	CPCDataset (supervised) + AVA retrieval references
Evaluation	FLMS test set, IoU = 0.8425
Training epoch	83
Crop queries	24 (Conditional DETR style)

How to Use

1. Clone the GitHub repository

git clone https://github.com/BWGZK-keke/ProCrop.git
cd ProCrop
pip install -r requirements.txt
pip install git+https://github.com/openai/CLIP.git

2. Download this checkpoint

from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="BWGZK/ProCrop",
    filename="procrop_flms_supervised.pth"
)

Or with the CLI:

huggingface-cli download BWGZK/ProCrop procrop_flms_supervised.pth --local-dir ./checkpoints

3. Run inference on a single image

cd cropping
python test_singleimage.py \
    --dataset_root /path/to/your/images \
    --retrieval_cache_dir /path/to/retrieval_tables \
    --retrieval_img_dir /path/to/CGL_images \
    --resume ./checkpoints/procrop_flms_supervised.pth \
    --crop_savepath ./results

4. Evaluate on FLMS

cd cropping
python main_cpc.py \
    --dataset_root /path/to/FLMS \
    --retrieval_cache_dir /path/to/retrieval_tables \
    --resume ./checkpoints/procrop_flms_supervised.pth \
    --eval

You also need:

Precomputed retrieval tables from BWGZK/procrop_dataset
SAM ViT-B checkpoint if training on GAIC/CAD: download here

Architecture

ProCrop extends Conditional DETR with a retrieval augmentation module:

Backbone: ResNet-50 with dilated C5 stage
Encoder: 6-layer transformer encoder for the query image
Retrieval fusion: Cross-attention between query features and top-K retrieved SAM embeddings (64×256)
Decoder: 6-layer transformer decoder with N=24 learnable crop queries
Heads:
- 4-dim bounding-box MLP (3 layers)
- 1-dim aesthetic-score classification head (binary focal loss)
EMA self-distillation: Mean-teacher framework for weakly-supervised training on CAD

Core implementation: cropping/models/conditional_detr_cpc.py

Related Resources

Code (GitHub): https://github.com/BWGZK-keke/ProCrop
Paper (arXiv): https://arxiv.org/abs/2505.22490
Dataset (HuggingFace): https://huggingface.co/datasets/BWGZK/procrop_dataset
- CAD dataset (242K weakly annotated images)
- Precomputed retrieval tables
- Pre-extracted SAM embedding databases

Citation

@article{ProCrop2025,
  title={ProCrop: Learning Aesthetic Image Cropping from Professional Compositions},
  author={Zhang, Ke and Ding, Tianyu and Jiang, Jiachen and Chen, Tianyi and Zharkov, Ilya and Patel, Vishal M. and Liang, Luming},
  journal={arXiv preprint arXiv:2505.22490},
  year={2025}
}

License

Apache 2.0. The model builds on ConditionalDETR, RALF, and Segment Anything — please consult their respective licenses.

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train BWGZK/ProCrop

Paper for BWGZK/ProCrop

ProCrop: Learning Aesthetic Image Cropping from Professional Compositions

Paper • 2505.22490 • Published May 28, 2025