File size: 1,396 Bytes

a9c0855
 
 
 
 
02c8b0b
4eb1b99
035c33f
a9c0855
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
02c8b0b
4eb1b99
a9c0855
 
 
02c8b0b
 
 
 
 
9e18604
 
02c8b0b
 
a9c0855
4d14930
02c8b0b
a9c0855

---
license: mit
tags:
- image-to-image
---
# Action2Vision: InstructPix2Pix Fine-tuning for Robotic Action Frame Prediction
GitHub: https://github.com/yutengzhang03/Action2Vision
<img src='img/show-example.png'/>



## Example

To use `InstructPix2Pix`, install `diffusers` using `main` for now. The pipeline will be available in the next release

```bash
pip install diffusers accelerate safetensors transformers
```

```python
import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler

model_id = "yutengz/Action2Vision"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None)
pipe.to("cuda")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)


to_tensor = transforms.ToTensor()
resize = transforms.Resize((256, 256))

def download_image(URL):
    return PIL.Image.open(requests.get(url, stream=True).raw).convert("RGB").resize((256, 256))

url = "https://github.com/yutengzhang03/Action2Vision/blob/main/img/source.png"
image = download_image(url)
prompt = "There is a hammer and a block in the middle of the table. If the block is closer to the left robotic arm, it uses the left arm to pick up the hammer and strike the block; otherwise, it does the opposite."
images = pipe(prompt, image=image).images
images[0]
```