Transform2Venom / README.md
passenger12138's picture
upload
8816b1e
metadata
license: apache-2.0
language:
  - en
base_model:
  - Wan-AI/Wan2.1-I2V-14B-480P
  - Wan-AI/Wan2.1-I2V-14B-480P-Diffusers
pipeline_tag: image-to-video
tags:
  - text-to-image
  - lora
  - diffusers
  - template:diffusion-lora
widget:
  - text: >-
      "The video begins with a anime young character with long hair. 5en3m venom
      transformation. Transform into a venom character transformation. Venom is
      depicted with his iconic black symbiote body, large white eyes with black
      pupils, sharp teeth, and a menacing expression. The transformation is
      smooth and seamless, blending the human figure with the monstrous ."
    output:
      url: example_videos/9_epoch40.mp4
  - text: >-
      The video begins with a woman wearing black clothes. 5en3m venom
      transformation. Transform into a venom character transformation. Venom is
      depicted with his iconic black symbiote body, large white eyes with black
      pupils, sharp teeth, and a menacing expression. The transformation is
      smooth and seamless, blending the human figure with the monstrous .
    output:
      url: example_videos/8_epoch40.mp4
  - text: >-
      The video begins with a man wearing a suit. 5en3m venom transformation.
      Transform into a venom character transformation. Venom is depicted with
      his iconic black symbiote body, large white eyes with black pupils, sharp
      teeth, and a menacing expression.The transformation is smooth and
      seamless, blending the human figure with the monstrous .
    output:
      url: example_videos/10_epoch40.mp4

Transform to Venom Effect LoRA for Wan2.1 14B I2V 480p

Overview

This LoRA is trained on the Wan2.1 14B I2V 480p model and allows you to transform any object to venom in an image. The effect works on a wide variety of objects, from animals to vehicles to people!

Features

  • Transform any image into a video of it being squished
  • Trained on the Wan2.1 14B 480p I2V base model
  • Consistent results across different object types
  • Simple prompt structure that's easy to adapt
Prompt
"The video begins with a anime young character with long hair. 5en3m venom transformation. Transform into a venom character transformation. Venom is depicted with his iconic black symbiote body, large white eyes with black pupils, sharp teeth, and a menacing expression. The transformation is smooth and seamless, blending the human figure with the monstrous ."
Prompt
The video begins with a woman wearing black clothes. 5en3m venom transformation. Transform into a venom character transformation. Venom is depicted with his iconic black symbiote body, large white eyes with black pupils, sharp teeth, and a menacing expression. The transformation is smooth and seamless, blending the human figure with the monstrous .
Prompt
The video begins with a man wearing a suit. 5en3m venom transformation. Transform into a venom character transformation. Venom is depicted with his iconic black symbiote body, large white eyes with black pupils, sharp teeth, and a menacing expression.The transformation is smooth and seamless, blending the human figure with the monstrous .

Model File and Inference Workflow

📥 Download Links:

Using with Diffusers

pip install git+https://github.com/huggingface/diffusers.git
import torch
from diffusers.utils import export_to_video, load_image
from diffusers import AutoencoderKLWan, WanImageToVideoPipeline
from transformers import CLIPVisionModel
import numpy as np

model_id = "Wan-AI/Wan2.1-I2V-14B-480P-Diffusers"
image_encoder = CLIPVisionModel.from_pretrained(model_id, subfolder="image_encoder", torch_dtype=torch.float32)
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)

# Note: Choose Unipcm scheduler to generate higher quality videos for Wan
flow_shift = 3.0  # 5.0 for 720P, 3.0 for 480P
scheduler = UniPCMultistepScheduler(
    prediction_type="flow_prediction",
    use_flow_sigmas=True,
    num_train_timesteps=1000,
    flow_shift=flow_shift,
    scheduler=scheduler,
)
pipe = WanImageToVideoPipeline.from_pretrained(model_id, vae=vae, image_encoder=image_encoder, torch_dtype=torch.bfloat16)
pipe.to("cuda")

pipe.load_lora_weights("passenger12138/Transform2Venom")

pipe.enable_model_cpu_offload() #for low-vram environments

prompt = "The video begins with a man wearing a suit. 5en3m venom transformation. Transform into a venom character transformation. Venom is depicted with his iconic black symbiote body, large white eyes with black pupils, sharp teeth, and a menacing expression. The transformation is smooth and seamless, blending the human figure with the monstrous ."

image = load_image('./test_i2vlora_imgs/1.png')

max_area = 480 * 832
aspect_ratio = image.height / image.width
mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
image = image.resize((width, height))

output = pipe(
    image=image,
    prompt=prompt,
    height=height,
    width=width,
    num_frames=81,
    guidance_scale=5.0,
    num_inference_steps=28
).frames[0]
export_to_video(output, "output.mp4", fps=16)

Recommended Settings

  • LoRA Strength: 1.0
  • Embedded Guidance Scale: 6.0
  • Flow Shift: 3.0

Trigger Words

The key trigger phrase is: 5en3m venom transformation.

Prompt Template

For best results, use this prompt structure:

The video begins with a [object]. 5en3m venom transformation. Transform into a venom character transformation. Venom is depicted with his iconic black symbiote body, large white eyes with black pupils, sharp teeth, and a menacing expression. The transformation is smooth and seamless, blending the human figure with the monstrous .

Simply replace [object] with whatever you want to see transform to venom!

ComfyUI Workflow

This LoRA works with a modified version of Kijai's Wan Video Wrapper workflow. The main modification is adding a Wan LoRA node connected to the base model.

Model Information

The model weights are available in Safetensors format. See the Downloads section above.

Training Details

  • Base Model: Wan2.1 14B I2V 480p
  • Training Data: 1.5 minutes of video (40 short clips of things being squished)
  • Epochs: 40

Additional Information

Training was done using Diffusion Pipe for Training

Acknowledgments

Special thanks to Kijai for the ComfyUI Wan Video Wrapper and tdrussell for the training scripts and RemadeAI some case!