Instructions to use Remade-AI/Cakeify with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Remade-AI/Cakeify with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Wan-AI/Wan2.1-I2V-14B-480P,Wan-AI/Wan2.1-I2V-14B-480P-Diffusers", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("Remade-AI/Cakeify") prompt = "The video opens on a puppy. A knife, held by a hand, is coming into frame and hovering over the puppy. The knife then begins cutting into the puppy to c4k3 cakeify it. As the knife slices the puppy open, the inside of the puppy is revealed to be cake with chocolate layers. The knife cuts through and the contents of the puppy are revealed." input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png") image = pipe(image=input_image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
Update README.md
Browse files
README.md
CHANGED
|
@@ -62,4 +62,115 @@ widget:
|
|
| 62 |
</div>
|
| 63 |
</div>
|
| 64 |
|
| 65 |
-
<Gallery />
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
</div>
|
| 63 |
</div>
|
| 64 |
|
| 65 |
+
<Gallery />
|
| 66 |
+
|
| 67 |
+
# Model File and Inference Workflow
|
| 68 |
+
|
| 69 |
+
## 📥 Download Links:
|
| 70 |
+
|
| 71 |
+
- [squish_18.safetensors](./squish_18.safetensors) - LoRA Model File
|
| 72 |
+
- [wan_img2video_lora_workflow.json](./workflow/wan_img2video_lora_workflow.json) - Wan I2V with LoRA Workflow for ComfyUI
|
| 73 |
+
|
| 74 |
+
## Using with Diffusers
|
| 75 |
+
```py
|
| 76 |
+
pip install git+https://github.com/huggingface/diffusers.git
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
```py
|
| 80 |
+
import torch
|
| 81 |
+
from diffusers.utils import export_to_video, load_image
|
| 82 |
+
from diffusers import AutoencoderKLWan, WanImageToVideoPipeline
|
| 83 |
+
from transformers import CLIPVisionModel
|
| 84 |
+
import numpy as np
|
| 85 |
+
|
| 86 |
+
model_id = "Wan-AI/Wan2.1-I2V-14B-480P-Diffusers"
|
| 87 |
+
image_encoder = CLIPVisionModel.from_pretrained(model_id, subfolder="image_encoder", torch_dtype=torch.float32)
|
| 88 |
+
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
|
| 89 |
+
pipe = WanImageToVideoPipeline.from_pretrained(model_id, vae=vae, image_encoder=image_encoder, torch_dtype=torch.bfloat16)
|
| 90 |
+
pipe.to("cuda")
|
| 91 |
+
|
| 92 |
+
pipe.load_lora_weights("Remade/Squish")
|
| 93 |
+
|
| 94 |
+
pipe.enable_model_cpu_offload() #for low-vram environments
|
| 95 |
+
|
| 96 |
+
prompt = "In the video, a miniature cat toy is presented. The cat toy is held in a person's hands. The person then presses on the cat toy, causing a sq41sh squish effect. The person keeps pressing down on the cat toy, further showing the sq41sh squish effect."
|
| 97 |
+
|
| 98 |
+
image = load_image("https://huggingface.co/datasets/diffusers/cat_toy_example/resolve/main/1.jpeg")
|
| 99 |
+
|
| 100 |
+
max_area = 480 * 832
|
| 101 |
+
aspect_ratio = image.height / image.width
|
| 102 |
+
mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
|
| 103 |
+
height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
|
| 104 |
+
width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
|
| 105 |
+
image = image.resize((width, height))
|
| 106 |
+
|
| 107 |
+
output = pipe(
|
| 108 |
+
image=image,
|
| 109 |
+
prompt=prompt,
|
| 110 |
+
height=height,
|
| 111 |
+
width=width,
|
| 112 |
+
num_frames=81,
|
| 113 |
+
guidance_scale=5.0,
|
| 114 |
+
num_inference_steps=28
|
| 115 |
+
).frames[0]
|
| 116 |
+
export_to_video(output, "output.mp4", fps=16)
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
---
|
| 120 |
+
<div style="background-color: #f8f9fa; padding: 20px; border-radius: 10px; margin-bottom: 20px;">
|
| 121 |
+
<div style="background-color: white; padding: 15px; border-radius: 8px; margin: 15px 0; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
|
| 122 |
+
<h2 style="color: #24292e; margin-top: 0;">Recommended Settings</h2>
|
| 123 |
+
<ul style="margin-bottom: 0;">
|
| 124 |
+
<li><b>LoRA Strength:</b> 1.0</li>
|
| 125 |
+
<li><b>Embedded Guidance Scale:</b> 6.0</li>
|
| 126 |
+
<li><b>Flow Shift:</b> 5.0</li>
|
| 127 |
+
</ul>
|
| 128 |
+
</div>
|
| 129 |
+
|
| 130 |
+
<div style="background-color: white; padding: 15px; border-radius: 8px; margin: 15px 0; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
|
| 131 |
+
<h2 style="color: #24292e; margin-top: 0;">Trigger Words</h2>
|
| 132 |
+
<p>The key trigger phrase is: <code style="background-color: #f0f0f0; padding: 3px 6px; border-radius: 4px;">sq41sh squish effect</code></p>
|
| 133 |
+
</div>
|
| 134 |
+
|
| 135 |
+
<div style="background-color: white; padding: 15px; border-radius: 8px; margin: 15px 0; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
|
| 136 |
+
<h2 style="color: #24292e; margin-top: 0;">Prompt Template</h2>
|
| 137 |
+
<p>For best results, use this prompt structure:</p>
|
| 138 |
+
<div style="background-color: #f0f0f0; padding: 12px; border-radius: 6px; margin: 10px 0;">
|
| 139 |
+
<i>The video opens on a [object]. A knife, held by a hand, is coming into frame and hovering over the [object]. The knife then begins cutting into the [object] to c4k3 cakeify it. As the knife slices the [object] open, the inside of the [object] is revealed to be cake with chocolate layers. The knife cuts through and the contents of the [object] are revealed.</i>
|
| 140 |
+
</div>
|
| 141 |
+
<p>Simply replace <code style="background-color: #f0f0f0; padding: 3px 6px; border-radius: 4px;">[object]</code> with whatever you want to see cakeified!</p>
|
| 142 |
+
</div>
|
| 143 |
+
|
| 144 |
+
<div style="background-color: white; padding: 15px; border-radius: 8px; margin: 15px 0; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
|
| 145 |
+
<h2 style="color: #24292e; margin-top: 0;">ComfyUI Workflow</h2>
|
| 146 |
+
<p>This LoRA works with a modified version of <a href="https://github.com/kijai/ComfyUI-WanVideoWrapper/blob/main/example_workflows/wanvideo_480p_I2V_example_02.json" style="color: #0366d6; text-decoration: none;">Kijai's Wan Video Wrapper workflow</a>. The main modification is adding a Wan LoRA node connected to the base model.</p>
|
| 147 |
+
<img src="./workflow/workflow_screenshot.png" style="width: 100%; border-radius: 8px; margin: 15px 0; box-shadow: 0 4px 8px rgba(0,0,0,0.1);">
|
| 148 |
+
<p>See the Downloads section above for the modified workflow.</p>
|
| 149 |
+
</div>
|
| 150 |
+
</div>
|
| 151 |
+
|
| 152 |
+
<div style="background-color: #f8f9fa; padding: 20px; border-radius: 10px; margin-bottom: 20px;">
|
| 153 |
+
<div style="background-color: white; padding: 15px; border-radius: 8px; margin: 15px 0; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
|
| 154 |
+
<h2 style="color: #24292e; margin-top: 0;">Model Information</h2>
|
| 155 |
+
<p>The model weights are available in Safetensors format. See the Downloads section above.</p>
|
| 156 |
+
</div>
|
| 157 |
+
|
| 158 |
+
<div style="background-color: white; padding: 15px; border-radius: 8px; margin: 15px 0; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
|
| 159 |
+
<h2 style="color: #24292e; margin-top: 0;">Training Details</h2>
|
| 160 |
+
<ul style="margin-bottom: 0;">
|
| 161 |
+
<li><b>Base Model:</b> Wan2.1 14B I2V 480p</li>
|
| 162 |
+
<li><b>Training Data:</b> 1.5 minutes of video (20 short clips of things being squished)</li>
|
| 163 |
+
<li><b>Epochs:</b> 18</li>
|
| 164 |
+
</ul>
|
| 165 |
+
</div>
|
| 166 |
+
|
| 167 |
+
<div style="background-color: white; padding: 15px; border-radius: 8px; margin: 15px 0; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
|
| 168 |
+
<h2 style="color: #24292e; margin-top: 0;">Additional Information</h2>
|
| 169 |
+
<p>Training was done using <a href="https://github.com/tdrussell/diffusion-pipe" style="color: #0366d6; text-decoration: none;">Diffusion Pipe for Training</a></p>
|
| 170 |
+
</div>
|
| 171 |
+
|
| 172 |
+
<div style="background-color: white; padding: 15px; border-radius: 8px; margin: 15px 0; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
|
| 173 |
+
<h2 style="color: #24292e; margin-top: 0;">Acknowledgments</h2>
|
| 174 |
+
<p style="margin-bottom: 0;">Special thanks to Kijai for the ComfyUI Wan Video Wrapper and tdrussell for the training scripts!</p>
|
| 175 |
+
</div>
|
| 176 |
+
</div>
|