|
|
---
|
|
|
license: mit
|
|
|
language:
|
|
|
- en
|
|
|
base_model:
|
|
|
- stable-diffusion-v1-5/stable-diffusion-v1-5
|
|
|
datasets:
|
|
|
- timbrooks/instructpix2pix-clip-filtered
|
|
|
- Aleksandar/Top-Bench-X
|
|
|
---
|
|
|
# EditCLIP: Representation Learning for Image Editing |
|
|
[](https://arxiv.org/abs/2503.20318) |
|
|
[](https://qianwangx.github.io/EditCLIP/) |
|
|
[](https://github.com/QianWangX/EditCLIP) |
|
|
[](https://iccv2025.thecvf.com/) |
|
|
|
|
|
## π‘ Abstract |
|
|
|
|
|
We introduce EditCLIP, a novel representation-learning approach for image editing. Our method learns a unified representation of edits by jointly encoding an input image and its edited counterpart, effectively capturing their transformation. To evaluate its effectiveness, we employ EditCLIP to solve two tasks: exemplar-based image editing and automated edit evaluation. In exemplar-based image editing, we replace text-based instructions in InstructPix2Pix with EditCLIP embeddings computed from a reference exemplar image pair. Experiments demonstrate that our approach outperforms state-of-the-art methods while being more efficient and versatile. For automated evaluation, EditCLIP assesses image edits by measuring the similarity between the EditCLIP embedding of a given image pair and either a textual editing instruction or the EditCLIP embedding of another reference image pair. Experiments show that EditCLIP aligns more closely with human judgments than existing CLIP-based metrics, providing a reliable measure of edit quality and structural preservation. |
|
|
|
|
|
## π Benchmark |
|
|
We evaluate EditCLIP using **Top-Bench-X**, a benchmark for image editing evaluation: |
|
|
- **Dataset:** Top-Bench-X |
|
|
- **Link:** https://huggingface.co/datasets/Aleksandar/Top-Bench-X |
|
|
|
|
|
|
|
|
## π Citation |
|
|
```bibtex |
|
|
@inproceedings{wang2025editclip, |
|
|
title={EditCLIP: Representation Learning for Image Editing}, |
|
|
author={Wang, Qian and Cveji{\'c}, Aleksandar and Eldesokey, Abdelrahman and Wonka, Peter}, |
|
|
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, |
|
|
pages={15960--15970}, |
|
|
year={2025} |
|
|
} |
|
|
``` |