Instructions to use yutengz/Action2Vision with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use yutengz/Action2Vision with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("yutengz/Action2Vision", dtype=torch.bfloat16, device_map="cuda") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Action2Vision: InstructPix2Pix Fine-tuning for Robotic Action Frame Prediction
GitHub: https://github.com/yutengzhang03/Action2Vision

Example
To use InstructPix2Pix, install diffusers using main for now. The pipeline will be available in the next release
pip install diffusers accelerate safetensors transformers
import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler
model_id = "yutengz/Action2Vision"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None)
pipe.to("cuda")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
to_tensor = transforms.ToTensor()
resize = transforms.Resize((256, 256))
def download_image(URL):
return PIL.Image.open(requests.get(url, stream=True).raw).convert("RGB").resize((256, 256))
url = "https://github.com/yutengzhang03/Action2Vision/blob/main/img/source.png"
image = download_image(url)
prompt = "There is a hammer and a block in the middle of the table. If the block is closer to the left robotic arm, it uses the left arm to pick up the hammer and strike the block; otherwise, it does the opposite."
images = pipe(prompt, image=image).images
images[0]
- Downloads last month
- 3