Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 92
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import load_image, export_to_video
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("stdstu123/Yume-I2V-540P", dtype=torch.bfloat16, device_map="cuda")
pipe.to("cuda")
prompt = "A man with short gray hair plays a red electric guitar."
image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png"
)
output = pipe(image=image, prompt=prompt).frames[0]
export_to_video(output, "output.mp4")Configuration Parsing Warning:Invalid JSON for config file config.json
This is a preview version of the Yume model, an interactive world generation model, presented in the paper Yume: An Interactive World Generation Model.
Yume aims to create an interactive, realistic, and dynamic world from an input image, allowing exploration and control.
Project Page: https://stdstu12.github.io/YUME-Project/
GitHub Repository: https://github.com/stdstu12/YUME
For detailed instructions and full inference scripts, please refer to the GitHub repository.