--- license: mit tags: - image-to-image --- # Action2Vision: InstructPix2Pix Fine-tuning for Robotic Action Frame Prediction GitHub: https://github.com/yutengzhang03/Action2Vision ## Example To use `InstructPix2Pix`, install `diffusers` using `main` for now. The pipeline will be available in the next release ```bash pip install diffusers accelerate safetensors transformers ``` ```python import PIL import requests import torch from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler model_id = "yutengz/Action2Vision" pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None) pipe.to("cuda") pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config) to_tensor = transforms.ToTensor() resize = transforms.Resize((256, 256)) def download_image(URL): return PIL.Image.open(requests.get(url, stream=True).raw).convert("RGB").resize((256, 256)) url = "https://github.com/yutengzhang03/Action2Vision/blob/main/img/source.png" image = download_image(url) prompt = "There is a hammer and a block in the middle of the table. If the block is closer to the left robotic arm, it uses the left arm to pick up the hammer and strike the block; otherwise, it does the opposite." images = pipe(prompt, image=image).images images[0] ```