Transform2Venom / README.md

passenger12138

upload

8816b1e 9 months ago

preview code

raw

history blame contribute delete

8.85 kB

metadata

license: apache-2.0
language:
  - en
base_model:
  - Wan-AI/Wan2.1-I2V-14B-480P
  - Wan-AI/Wan2.1-I2V-14B-480P-Diffusers
pipeline_tag: image-to-video
tags:
  - text-to-image
  - lora
  - diffusers
  - template:diffusion-lora
widget:
  - text: >-
      "The video begins with a anime young character with long hair. 5en3m venom
      transformation. Transform into a venom character transformation. Venom is
      depicted with his iconic black symbiote body, large white eyes with black
      pupils, sharp teeth, and a menacing expression. The transformation is
      smooth and seamless, blending the human figure with the monstrous ."
    output:
      url: example_videos/9_epoch40.mp4
  - text: >-
      The video begins with a woman wearing black clothes. 5en3m venom
      transformation. Transform into a venom character transformation. Venom is
      depicted with his iconic black symbiote body, large white eyes with black
      pupils, sharp teeth, and a menacing expression. The transformation is
      smooth and seamless, blending the human figure with the monstrous .
    output:
      url: example_videos/8_epoch40.mp4
  - text: >-
      The video begins with a man wearing a suit. 5en3m venom transformation.
      Transform into a venom character transformation. Venom is depicted with
      his iconic black symbiote body, large white eyes with black pupils, sharp
      teeth, and a menacing expression.The transformation is smooth and
      seamless, blending the human figure with the monstrous .
    output:
      url: example_videos/10_epoch40.mp4

Transform to Venom Effect LoRA for Wan2.1 14B I2V 480p

Overview

This LoRA is trained on the Wan2.1 14B I2V 480p model and allows you to transform any object to venom in an image. The effect works on a wide variety of objects, from animals to vehicles to people!

Features

Transform any image into a video of it being squished
Trained on the Wan2.1 14B 480p I2V base model
Consistent results across different object types
Simple prompt structure that's easy to adapt

Prompt: "The video begins with a anime young character with long hair. 5en3m venom transformation. Transform into a venom character transformation. Venom is depicted with his iconic black symbiote body, large white eyes with black pupils, sharp teeth, and a menacing expression. The transformation is smooth and seamless, blending the human figure with the monstrous ."

Prompt: The video begins with a woman wearing black clothes. 5en3m venom transformation. Transform into a venom character transformation. Venom is depicted with his iconic black symbiote body, large white eyes with black pupils, sharp teeth, and a menacing expression. The transformation is smooth and seamless, blending the human figure with the monstrous .

Prompt: The video begins with a man wearing a suit. 5en3m venom transformation. Transform into a venom character transformation. Venom is depicted with his iconic black symbiote body, large white eyes with black pupils, sharp teeth, and a menacing expression.The transformation is smooth and seamless, blending the human figure with the monstrous .

Model File and Inference Workflow

📥 Download Links:

transform2venom.safetensors - LoRA Model File
wan_img2video_lora_workflow.json - Wan I2V with LoRA Workflow for ComfyUI

Using with Diffusers

pip install git+https://github.com/huggingface/diffusers.git

import torch
from diffusers.utils import export_to_video, load_image
from diffusers import AutoencoderKLWan, WanImageToVideoPipeline
from transformers import CLIPVisionModel
import numpy as np

model_id = "Wan-AI/Wan2.1-I2V-14B-480P-Diffusers"
image_encoder = CLIPVisionModel.from_pretrained(model_id, subfolder="image_encoder", torch_dtype=torch.float32)
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)

# Note: Choose Unipcm scheduler to generate higher quality videos for Wan
flow_shift = 3.0  # 5.0 for 720P, 3.0 for 480P
scheduler = UniPCMultistepScheduler(
    prediction_type="flow_prediction",
    use_flow_sigmas=True,
    num_train_timesteps=1000,
    flow_shift=flow_shift,
    scheduler=scheduler,
)
pipe = WanImageToVideoPipeline.from_pretrained(model_id, vae=vae, image_encoder=image_encoder, torch_dtype=torch.bfloat16)
pipe.to("cuda")

pipe.load_lora_weights("passenger12138/Transform2Venom")

pipe.enable_model_cpu_offload() #for low-vram environments

prompt = "The video begins with a man wearing a suit. 5en3m venom transformation. Transform into a venom character transformation. Venom is depicted with his iconic black symbiote body, large white eyes with black pupils, sharp teeth, and a menacing expression. The transformation is smooth and seamless, blending the human figure with the monstrous ."

image = load_image('./test_i2vlora_imgs/1.png')

max_area = 480 * 832
aspect_ratio = image.height / image.width
mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
image = image.resize((width, height))

output = pipe(
    image=image,
    prompt=prompt,
    height=height,
    width=width,
    num_frames=81,
    guidance_scale=5.0,
    num_inference_steps=28
).frames[0]
export_to_video(output, "output.mp4", fps=16)

Recommended Settings

LoRA Strength: 1.0
Embedded Guidance Scale: 6.0
Flow Shift: 3.0

Trigger Words

The key trigger phrase is: 5en3m venom transformation.

Prompt Template

For best results, use this prompt structure:

The video begins with a [object]. 5en3m venom transformation. Transform into a venom character transformation. Venom is depicted with his iconic black symbiote body, large white eyes with black pupils, sharp teeth, and a menacing expression. The transformation is smooth and seamless, blending the human figure with the monstrous .

Simply replace [object] with whatever you want to see transform to venom!

ComfyUI Workflow

This LoRA works with a modified version of Kijai's Wan Video Wrapper workflow. The main modification is adding a Wan LoRA node connected to the base model.

Model Information

The model weights are available in Safetensors format. See the Downloads section above.

Training Details

Base Model: Wan2.1 14B I2V 480p
Training Data: 1.5 minutes of video (40 short clips of things being squished)
Epochs: 40

Additional Information

Training was done using Diffusion Pipe for Training

Acknowledgments

Special thanks to Kijai for the ComfyUI Wan Video Wrapper and tdrussell for the training scripts and RemadeAI some case!