lllyasviel
/

FramePackI2V_HY

Diffusers

Safetensors

Model card Files Files and versions

xet

Community

linoyts HF Staff commited on May 12, 2025

Commit

b69b380

verified ·

1 Parent(s): 86cef43

Add README - relevant links and inference example

Browse files

Files changed (1) hide show

README.md +58 -1

README.md CHANGED Viewed

@@ -1,5 +1,62 @@
 ---
 library_name: diffusers
 ---
-This is the `f1k1_x_g9_f1k1f2k2f16k4_td` FramePack for HY

 ---
 library_name: diffusers
+pipeline_tag: image-to-video
 ---
+# FramePack - Video diffusion, but feels like image diffusion
+[*Packing Input Frame Context in Next-Frame Prediction Models for Video Generation*](https://lllyasviel.github.io/frame_pack_gitpage/)
+[**arxiv**](https://arxiv.org/abs/2504.12626), [**code**](https://github.com/lllyasviel/FramePack)
+---
+This repo contains pre-trained `f1k1_x_g9_f1k1f2k2f16k4_td` weights and 🧨 `diffusers` inference code for FramePack for Hunyuan Video.
+## Inference with 🧨 Diffusers
+```
+import torch
+from diffusers import HunyuanVideoFramepackPipeline, HunyuanVideoFramepackTransformer3DModel
+from diffusers.hooks import apply_group_offloading
+from diffusers.utils import export_to_video, load_image
+from transformers import SiglipImageProcessor, SiglipVisionModel
+transformer = HunyuanVideoFramepackTransformer3DModel.from_pretrained(
+    "lllyasviel/FramePackI2V_HY", torch_dtype=torch.bfloat16
+)
+feature_extractor = SiglipImageProcessor.from_pretrained(
+    "lllyasviel/flux_redux_bfl", subfolder="feature_extractor"
+)
+image_encoder = SiglipVisionModel.from_pretrained(
+    "lllyasviel/flux_redux_bfl", subfolder="image_encoder", torch_dtype=torch.float16
+)
+pipe = HunyuanVideoFramepackPipeline.from_pretrained(
+    "hunyuanvideo-community/HunyuanVideo",
+    transformer=transformer,
+    feature_extractor=feature_extractor,
+    image_encoder=image_encoder,
+    torch_dtype=torch.float16,
+)
+onload_device = torch.device("cuda")
+offload_device = torch.device("cpu")
+list(map(
+    lambda x: apply_group_offloading(x, onload_device, offload_device, offload_type="leaf_level", use_stream=True, low_cpu_mem_usage=True),
+    [pipe.text_encoder, pipe.text_encoder_2, pipe.transformer]
+))
+pipe.image_encoder.to(onload_device)
+pipe.vae.to(onload_device)
+pipe.vae.enable_tiling()
+image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/penguin.png")
+output = pipe(
+    image=image,
+    prompt="A penguin dancing in the snow",
+    height=832,
+    width=480,
+    num_frames=91,
+    num_inference_steps=30,
+    guidance_scale=9.0,
+    generator=torch.Generator().manual_seed(0),
+).frames[0]
+print(f"Max memory: {torch.cuda.max_memory_allocated() / 1024**3:.3f} GB")
+export_to_video(output, "output.mp4", fps=30)
+```