Image-to-Image
Diffusers
Safetensors
English
model_hub_mixin
pytorch_model_hub_mixin
Instruct-CLIP / README.md
nielsr's picture
nielsr HF Staff
Improve model card: Fix pipeline tag, add library name and improve content
b8ef3ef verified
|
raw
history blame
2.79 kB
metadata
base_model:
  - SherryXTChen/LatentDiffusionDINOv2
datasets:
  - timbrooks/instructpix2pix-clip-filtered
  - SherryXTChen/InstructCLIP-InstructPix2Pix-Data
language:
  - en
license: apache-2.0
pipeline_tag: image-to-image
library_name: diffusers
tags:
  - model_hub_mixin
  - pytorch_model_hub_mixin

InstructCLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning (CVPR 2025)

This model has been pushed to the Hub using the PytorchModelHubMixin integration. The model is based on the paper Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning.

Arxiv | Image Editing Model | Data Refinement Model | Data

Capabilities

Figure 1 Figure 2

Installation

pip install -r requirements.txt

Inference

import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler

model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.load_lora_weights("SherryXTChen/InstructCLIP-InstructPix2Pix")
pipe.to("cuda")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)

url = "https://raw.githubusercontent.com/SherryXTChen/Instruct-CLIP/refs/heads/main/assets/1_input.jpg"
def download_image(url):
    image = PIL.Image.open(requests.get(url, stream=True).raw)
    image = PIL.ImageOps.exif_transpose(image)
    image = image.convert("RGB")
    return image
image = download_image(url)

prompt = "as a 3 d sculpture"
images = pipe(prompt, image=image, num_inference_steps=20).images
images[0].save("output.jpg")

Citation

@misc{chen2025instructclipimprovinginstructionguidedimage,
      title={Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning}, 
      author={Sherry X. Chen and Misha Sra and Pradeep Sen},
      year={2025},
      eprint={2503.18406},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.18406}, 
}