metadata
license: apache-2.0
tags:
- image-cropping
- aesthetic-cropping
- computer-vision
- retrieval-augmented
- conditional-detr
pipeline_tag: image-to-image
library_name: pytorch
datasets:
- BWGZK/procrop_dataset
language:
- en
ProCrop: Learning Aesthetic Image Cropping from Professional Compositions
This is the headline supervised checkpoint for the AAAI 2026 paper "ProCrop: Learning Aesthetic Image Cropping from Professional Compositions" by Zhang et al.
Model Description
ProCrop is a retrieval-augmented framework for aesthetic image cropping that leverages professional photography compositions as guidance. Given a query image, ProCrop:
- Retrieves compositionally similar professional images from a large database (AVA / CGL) using SAM embeddings and Faiss nearest-neighbor search.
- Fuses retrieved features with the query via cross-attention.
- Predicts diverse crop proposals ranked by aesthetic score using a Conditional DETR decoder.
Reported Performance (FLMS supervised setting)
| Metric | Value |
|---|---|
| IoU | 0.843 |
| BDE (Displacement) | 0.036 |
This checkpoint matches the FLMS row of Table 3 in the paper.
Checkpoint Details
| Property | Value |
|---|---|
| File | procrop_flms_supervised.pth |
| Size | 512 MB |
| Original filename | checkpoint0008200.8425250053405762.pth |
| Trainable params | ~44.8M |
| Backbone | ResNet-50 (DC5) + Transformer encoder/decoder |
| Training data | CPCDataset (supervised) + AVA retrieval references |
| Evaluation | FLMS test set, IoU = 0.8425 |
| Training epoch | 83 |
| Crop queries | 24 (Conditional DETR style) |
How to Use
1. Clone the GitHub repository
git clone https://github.com/BWGZK-keke/ProCrop.git
cd ProCrop
pip install -r requirements.txt
pip install git+https://github.com/openai/CLIP.git
2. Download this checkpoint
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
repo_id="BWGZK/ProCrop",
filename="procrop_flms_supervised.pth"
)
Or with the CLI:
huggingface-cli download BWGZK/ProCrop procrop_flms_supervised.pth --local-dir ./checkpoints
3. Run inference on a single image
cd cropping
python test_singleimage.py \
--dataset_root /path/to/your/images \
--retrieval_cache_dir /path/to/retrieval_tables \
--retrieval_img_dir /path/to/CGL_images \
--resume ./checkpoints/procrop_flms_supervised.pth \
--crop_savepath ./results
4. Evaluate on FLMS
cd cropping
python main_cpc.py \
--dataset_root /path/to/FLMS \
--retrieval_cache_dir /path/to/retrieval_tables \
--resume ./checkpoints/procrop_flms_supervised.pth \
--eval
You also need:
- Precomputed retrieval tables from BWGZK/procrop_dataset
- SAM ViT-B checkpoint if training on GAIC/CAD: download here
Architecture
ProCrop extends Conditional DETR with a retrieval augmentation module:
- Backbone: ResNet-50 with dilated C5 stage
- Encoder: 6-layer transformer encoder for the query image
- Retrieval fusion: Cross-attention between query features and top-K retrieved SAM embeddings (64×256)
- Decoder: 6-layer transformer decoder with N=24 learnable crop queries
- Heads:
- 4-dim bounding-box MLP (3 layers)
- 1-dim aesthetic-score classification head (binary focal loss)
- EMA self-distillation: Mean-teacher framework for weakly-supervised training on CAD
Core implementation: cropping/models/conditional_detr_cpc.py
Related Resources
- Code (GitHub): https://github.com/BWGZK-keke/ProCrop
- Paper (arXiv): https://arxiv.org/abs/2505.22490
- Dataset (HuggingFace): https://huggingface.co/datasets/BWGZK/procrop_dataset
- CAD dataset (242K weakly annotated images)
- Precomputed retrieval tables
- Pre-extracted SAM embedding databases
Citation
@article{ProCrop2025,
title={ProCrop: Learning Aesthetic Image Cropping from Professional Compositions},
author={Zhang, Ke and Ding, Tianyu and Jiang, Jiachen and Chen, Tianyi and Zharkov, Ilya and Patel, Vishal M. and Liang, Luming},
journal={arXiv preprint arXiv:2505.22490},
year={2025}
}
License
Apache 2.0. The model builds on ConditionalDETR, RALF, and Segment Anything — please consult their respective licenses.