File size: 4,940 Bytes
e23b994 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | ---
license: apache-2.0
tags:
- image-cropping
- aesthetic-cropping
- computer-vision
- retrieval-augmented
- conditional-detr
pipeline_tag: image-to-image
library_name: pytorch
datasets:
- BWGZK/procrop_dataset
language:
- en
---
# ProCrop: Learning Aesthetic Image Cropping from Professional Compositions
[](https://arxiv.org/abs/2505.22490)
[](https://github.com/BWGZK-keke/ProCrop)
This is the **headline supervised checkpoint** for the AAAI 2026 paper "ProCrop: Learning Aesthetic Image Cropping from Professional Compositions" by Zhang et al.
## Model Description
ProCrop is a retrieval-augmented framework for aesthetic image cropping that leverages professional photography compositions as guidance. Given a query image, ProCrop:
1. **Retrieves** compositionally similar professional images from a large database (AVA / CGL) using SAM embeddings and Faiss nearest-neighbor search.
2. **Fuses** retrieved features with the query via cross-attention.
3. **Predicts** diverse crop proposals ranked by aesthetic score using a Conditional DETR decoder.
## Reported Performance (FLMS supervised setting)
| Metric | Value |
|--------|-------|
| **IoU** | **0.843** |
| **BDE (Displacement)** | **0.036** |
This checkpoint matches the FLMS row of Table 3 in the paper.
## Checkpoint Details
| Property | Value |
|----------|-------|
| File | `procrop_flms_supervised.pth` |
| Size | 512 MB |
| Original filename | `checkpoint0008200.8425250053405762.pth` |
| Trainable params | ~44.8M |
| Backbone | ResNet-50 (DC5) + Transformer encoder/decoder |
| Training data | CPCDataset (supervised) + AVA retrieval references |
| Evaluation | FLMS test set, IoU = 0.8425 |
| Training epoch | 83 |
| Crop queries | 24 (Conditional DETR style) |
## How to Use
### 1. Clone the GitHub repository
```bash
git clone https://github.com/BWGZK-keke/ProCrop.git
cd ProCrop
pip install -r requirements.txt
pip install git+https://github.com/openai/CLIP.git
```
### 2. Download this checkpoint
```python
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
repo_id="BWGZK/ProCrop",
filename="procrop_flms_supervised.pth"
)
```
Or with the CLI:
```bash
huggingface-cli download BWGZK/ProCrop procrop_flms_supervised.pth --local-dir ./checkpoints
```
### 3. Run inference on a single image
```bash
cd cropping
python test_singleimage.py \
--dataset_root /path/to/your/images \
--retrieval_cache_dir /path/to/retrieval_tables \
--retrieval_img_dir /path/to/CGL_images \
--resume ./checkpoints/procrop_flms_supervised.pth \
--crop_savepath ./results
```
### 4. Evaluate on FLMS
```bash
cd cropping
python main_cpc.py \
--dataset_root /path/to/FLMS \
--retrieval_cache_dir /path/to/retrieval_tables \
--resume ./checkpoints/procrop_flms_supervised.pth \
--eval
```
You also need:
- **Precomputed retrieval tables** from [BWGZK/procrop_dataset](https://huggingface.co/datasets/BWGZK/procrop_dataset)
- **SAM ViT-B checkpoint** if training on GAIC/CAD: [download here](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth)
## Architecture
ProCrop extends **Conditional DETR** with a retrieval augmentation module:
- **Backbone**: ResNet-50 with dilated C5 stage
- **Encoder**: 6-layer transformer encoder for the query image
- **Retrieval fusion**: Cross-attention between query features and top-K retrieved SAM embeddings (64×256)
- **Decoder**: 6-layer transformer decoder with N=24 learnable crop queries
- **Heads**:
- 4-dim bounding-box MLP (3 layers)
- 1-dim aesthetic-score classification head (binary focal loss)
- **EMA self-distillation**: Mean-teacher framework for weakly-supervised training on CAD
Core implementation: [`cropping/models/conditional_detr_cpc.py`](https://github.com/BWGZK-keke/ProCrop/blob/main/cropping/models/conditional_detr_cpc.py)
## Related Resources
- **Code (GitHub)**: https://github.com/BWGZK-keke/ProCrop
- **Paper (arXiv)**: https://arxiv.org/abs/2505.22490
- **Dataset (HuggingFace)**: https://huggingface.co/datasets/BWGZK/procrop_dataset
- CAD dataset (242K weakly annotated images)
- Precomputed retrieval tables
- Pre-extracted SAM embedding databases
## Citation
```bibtex
@article{ProCrop2025,
title={ProCrop: Learning Aesthetic Image Cropping from Professional Compositions},
author={Zhang, Ke and Ding, Tianyu and Jiang, Jiachen and Chen, Tianyi and Zharkov, Ilya and Patel, Vishal M. and Liang, Luming},
journal={arXiv preprint arXiv:2505.22490},
year={2025}
}
```
## License
Apache 2.0. The model builds on [ConditionalDETR](https://github.com/Atten4Vis/ConditionalDETR), [RALF](https://github.com/CyberAgentAILab/RALF), and [Segment Anything](https://github.com/facebookresearch/segment-anything) — please consult their respective licenses.
|