metadata
base_model:
- SherryXTChen/LatentDiffusionDINOv2
datasets:
- timbrooks/instructpix2pix-clip-filtered
- SherryXTChen/InstructCLIP-InstructPix2Pix-Data
language:
- en
license: apache-2.0
pipeline_tag: image-to-image
library_name: diffusers
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
InstructCLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning (CVPR 2025)
This model has been pushed to the Hub using the PytorchModelHubMixin integration. The model is based on the paper Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning.
Arxiv | Image Editing Model | Data Refinement Model | Data
Capabilities
Installation
pip install -r requirements.txt
Inference
import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler
model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.load_lora_weights("SherryXTChen/InstructCLIP-InstructPix2Pix")
pipe.to("cuda")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
url = "https://raw.githubusercontent.com/SherryXTChen/Instruct-CLIP/refs/heads/main/assets/1_input.jpg"
def download_image(url):
image = PIL.Image.open(requests.get(url, stream=True).raw)
image = PIL.ImageOps.exif_transpose(image)
image = image.convert("RGB")
return image
image = download_image(url)
prompt = "as a 3 d sculpture"
images = pipe(prompt, image=image, num_inference_steps=20).images
images[0].save("output.jpg")
Citation
@misc{chen2025instructclipimprovinginstructionguidedimage,
title={Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning},
author={Sherry X. Chen and Misha Sra and Pradeep Sen},
year={2025},
eprint={2503.18406},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.18406},
}