Action2Vision / README.md
yutengz's picture
Update README.md
e7a7448 verified
---
title: Action2Vision
emoji: 🤖
colorFrom: blue
colorTo: pink
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
---
# Action2Vision: InstructPix2Pix Fine-tuning for Robotic Action Frame Prediction
GitHub: https://github.com/yutengzhang03/Action2Vision
<img src='img/show-example.png'/>
## Example
To use `InstructPix2Pix`, install `diffusers` using `main` for now. The pipeline will be available in the next release
```bash
pip install diffusers accelerate safetensors transformers
```
```python
import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler
model_id = "yutengz/Action2Vision"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None)
pipe.to("cuda")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
to_tensor = transforms.ToTensor()
resize = transforms.Resize((256, 256))
def download_image(url):
image = PIL.Image.open(requests.get(url, stream=True).raw).convert("RGB").resize((256, 256))
return image
url = "https://github.com/yutengzhang03/Action2Vision/blob/main/img/source.png"
image = download_image(url)
prompt = "There is a hammer and a block in the middle of the table. If the block is closer to the left robotic arm, it uses the left arm to pick up the hammer and strike the block; otherwise, it does the opposite."
images = pipe(prompt, image=image).images
images[0]
```