Image-to-Image
Diffusers
Safetensors
English
model_hub_mixin
pytorch_model_hub_mixin
File size: 2,788 Bytes
c2acd7f
b8ef3ef
 
06748ef
 
 
 
 
b8ef3ef
 
 
 
 
 
c2acd7f
 
b8ef3ef
 
 
06748ef
 
b8ef3ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
base_model:
- SherryXTChen/LatentDiffusionDINOv2
datasets:
- timbrooks/instructpix2pix-clip-filtered
- SherryXTChen/InstructCLIP-InstructPix2Pix-Data
language:
- en
license: apache-2.0
pipeline_tag: image-to-image
library_name: diffusers
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
---

# InstructCLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning (CVPR 2025)

This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration.
The model is based on the paper [Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning](https://huggingface.co/papers/2503.18406).

[Arxiv](http://arxiv.org/abs/2503.18406) | [Image Editing Model](https://huggingface.co/SherryXTChen/InstructCLIP-InstructPix2Pix) | [Data Refinement Model](https://huggingface.co/SherryXTChen/Instruct-CLIP) | [Data](https://huggingface.co/datasets/SherryXTChen/InstructCLIP-InstructPix2Pix-Data)


## Capabilities

<p align="center">
  <img src="https://github.com/SherryXTChen/Instruct-CLIP/blob/main/assets/teaser_1.png" alt="Figure 1" width="43%">
  <img src="https://github.com/SherryXTChen/Instruct-CLIP/blob/main/assets/teaser_2.png" alt="Figure 2" width="50%">
</p>

## Installation
```
pip install -r requirements.txt
```

## Inference

```python
import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler

model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.load_lora_weights("SherryXTChen/InstructCLIP-InstructPix2Pix")
pipe.to("cuda")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)

url = "https://raw.githubusercontent.com/SherryXTChen/Instruct-CLIP/refs/heads/main/assets/1_input.jpg"
def download_image(url):
    image = PIL.Image.open(requests.get(url, stream=True).raw)
    image = PIL.ImageOps.exif_transpose(image)
    image = image.convert("RGB")
    return image
image = download_image(url)

prompt = "as a 3 d sculpture"
images = pipe(prompt, image=image, num_inference_steps=20).images
images[0].save("output.jpg")
```

## Citation
```bibtex
@misc{chen2025instructclipimprovinginstructionguidedimage,
      title={Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning}, 
      author={Sherry X. Chen and Misha Sra and Pradeep Sen},
      year={2025},
      eprint={2503.18406},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.18406}, 
}
```