Image-to-Video
Diffusers
Safetensors
How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import load_image, export_to_video

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("anyeZHY/tesseract", dtype=torch.bfloat16, device_map="cuda")
pipe.to("cuda")

prompt = "A man with short gray hair plays a red electric guitar."
image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png"
)

output = pipe(image=image, prompt=prompt).frames[0]
export_to_video(output, "output.mp4")

TesserAct: Learning 4D Embodied World Models

Haoyu Zhen*, Qiao Sun*, Hongxin Zhang, Junyan Li, Siyuan Zhou, Yilun Du, Chuang Gan

Paper PDF  |  Project Page  |  Model on Hugging Face  |  Code

We propose TesserAct, the 4D Embodied World Model, which takes input images and text instruction to generate RGB, depth, and normal videos, reconstructing a 4D scene and predicting actions.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Paper for anyeZHY/tesseract