no@email.com commited on
Commit
4b40584
·
0 Parent(s):

Initial commit

Browse files
Files changed (8) hide show
  1. .gitattributes +36 -0
  2. .gitignore +43 -0
  3. README.md +15 -0
  4. USAGE.md +166 -0
  5. aoti.py +35 -0
  6. app.py +263 -0
  7. example.py +294 -0
  8. requirements.txt +11 -0
.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.JPG filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ *.py[cod]
3
+ *$py.class
4
+ *.so
5
+ .Python
6
+ build/
7
+ develop-eggs/
8
+ dist/
9
+ downloads/
10
+ eggs/
11
+ .eggs/
12
+ lib/
13
+ lib64/
14
+ parts/
15
+ sdist/
16
+ var/
17
+ wheels/
18
+ *.egg-info/
19
+ .installed.cfg
20
+ *.egg
21
+ .env
22
+ .venv
23
+ env/
24
+ venv/
25
+ ENV/
26
+ env.bak/
27
+ venv.bak/
28
+ *.mp4
29
+ *.avi
30
+ *.mov
31
+ output_video.*
32
+ *.log
33
+ .DS_Store
34
+ .vscode/
35
+ .idea/
36
+ *.swp
37
+ *.swo
38
+ *~
39
+ gradio_cached_examples/
40
+ flagged/
41
+
42
+
43
+ *.JPG
README.md ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Dream-wan2-2-faster-Pro
3
+ emoji: 🎬
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 5.44.1 # يطابق requirements.txt
8
+ app_file: app.py
9
+ pinned: true
10
+ ---
11
+
12
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
13
+
14
+ # Dream-wan2-2-faster-Pro
15
+ مولد فيديو من صورة واقعي فائق السرعة، مدعوم بـ Wan2.2 مع Lightning LoRA وAoT لـ ZeroGPU.
USAGE.md ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Usage Guide - WAN 2.2 Image-to-Video LoRA Demo
2
+
3
+ ## Quick Start
4
+
5
+ ### 1. Deploying to Hugging Face Spaces
6
+
7
+ To deploy this demo to Hugging Face Spaces:
8
+
9
+ ```bash
10
+ # Install git-lfs if not already installed
11
+ git lfs install
12
+
13
+ # Create a new Space on huggingface.co
14
+ # Then clone your space repository
15
+ git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
16
+ cd YOUR_SPACE_NAME
17
+
18
+ # Copy all files from this demo
19
+ cp -r * YOUR_SPACE_NAME/
20
+
21
+ # Commit and push
22
+ git add .
23
+ git commit -m "Initial commit: WAN 2.2 Image-to-Video LoRA Demo"
24
+ git push
25
+ ```
26
+
27
+ ### 2. Running Locally
28
+
29
+ ```bash
30
+ # Create a virtual environment
31
+ python -m venv venv
32
+ source venv/bin/activate # On Windows: venv\Scripts\activate
33
+
34
+ # Install dependencies
35
+ pip install -r requirements.txt
36
+
37
+ # Run the app
38
+ python app.py
39
+ ```
40
+
41
+ The app will be available at `http://localhost:7860`
42
+
43
+ ## Using the Demo
44
+
45
+ ### Basic Usage
46
+
47
+ 1. **Upload Image**: Click the image upload area and select an image file
48
+ 2. **Enter Prompt**: Type a description of the motion you want (e.g., "A person walking forward, cinematic")
49
+ 3. **Click Generate**: Wait for the video to be generated (first run will download the model)
50
+ 4. **View Result**: The generated video will appear in the output area
51
+
52
+ ### Advanced Settings
53
+
54
+ Expand the "Advanced Settings" accordion to access:
55
+
56
+ - **Inference Steps** (20-100): More steps = higher quality but slower generation
57
+ - 20-30: Fast, lower quality
58
+ - 50: Balanced (recommended)
59
+ - 80-100: Slow, highest quality
60
+
61
+ - **Guidance Scale** (1.0-15.0): How closely to follow the prompt
62
+ - 1.0-3.0: More creative, less faithful to prompt
63
+ - 6.0: Balanced (recommended)
64
+ - 10.0-15.0: Very faithful to prompt, less creative
65
+
66
+ - **Use LoRA**: Enable/disable LoRA fine-tuning
67
+
68
+ - **LoRA Type**:
69
+ - **High-Noise**: Best for dynamic, action-heavy scenes
70
+ - **Low-Noise**: Best for subtle, smooth motions
71
+
72
+ ## Example Prompts
73
+
74
+ ### Good Prompts
75
+
76
+ - "A cat walking through a garden, sunny day, high quality"
77
+ - "Waves crashing on a beach, sunset lighting, cinematic"
78
+ - "A car driving down a highway, fast motion, 4k"
79
+ - "Smoke rising from a campfire, slow motion"
80
+
81
+ ### Tips for Better Results
82
+
83
+ 1. **Be Specific**: Include details about motion, lighting, and quality
84
+ 2. **Use Keywords**: "cinematic", "high quality", "4k", "smooth"
85
+ 3. **Describe Motion**: Clearly state what should move and how
86
+ 4. **Consider Style**: Add style descriptors like "photorealistic" or "animated"
87
+
88
+ ## Troubleshooting
89
+
90
+ ### Out of Memory Error
91
+
92
+ If you encounter OOM errors:
93
+
94
+ 1. The model requires significant VRAM (16GB+ recommended)
95
+ 2. On Hugging Face Spaces, ensure you're using at least `gpu-medium` hardware
96
+ 3. For local runs, try reducing the number of frames or using CPU offloading
97
+
98
+ ### Slow Generation
99
+
100
+ - First generation will be slower (model downloads)
101
+ - Reduce inference steps for faster results
102
+ - Ensure GPU is being used (check logs for "Loading model on cuda")
103
+
104
+ ### Model Not Loading
105
+
106
+ If the model fails to load:
107
+
108
+ 1. Check your internet connection (model is ~20GB)
109
+ 2. Ensure sufficient disk space
110
+ 3. For Hugging Face Spaces, check your Space's logs
111
+
112
+ ## Customization
113
+
114
+ ### Using Your Own LoRA Files
115
+
116
+ To use your own LoRA weights:
117
+
118
+ 1. Upload LoRA `.safetensors` files to Hugging Face
119
+ 2. Update the URLs in `app.py`:
120
+
121
+ ```python
122
+ HIGH_NOISE_LORA_URL = "https://huggingface.co/YOUR_USERNAME/YOUR_REPO/resolve/main/your_lora.safetensors"
123
+ ```
124
+
125
+ 3. Uncomment and implement the LoRA loading code in the `generate_video` function
126
+
127
+ ### Changing the Model
128
+
129
+ To use a different model:
130
+
131
+ 1. Update `MODEL_ID` in `app.py`
132
+ 2. Ensure the model is compatible with `CogVideoXImageToVideoPipeline`
133
+ 3. Adjust memory optimizations if needed
134
+
135
+ ## Performance Notes
136
+
137
+ - **GPU (A10G/T4)**: ~2-3 minutes per video
138
+ - **GPU (A100)**: ~1-2 minutes per video
139
+ - **CPU**: Not recommended (20+ minutes)
140
+
141
+ ## API Access
142
+
143
+ For programmatic access, you can use the Gradio Client:
144
+
145
+ ```python
146
+ from gradio_client import Client
147
+
148
+ client = Client("YOUR_USERNAME/YOUR_SPACE_NAME")
149
+ result = client.predict(
150
+ image="path/to/image.jpg",
151
+ prompt="A cat walking",
152
+ api_name="/predict"
153
+ )
154
+ ```
155
+
156
+ ## Credits
157
+
158
+ - Model: CogVideoX by THUDM
159
+ - Framework: Hugging Face Diffusers
160
+ - Interface: Gradio
161
+
162
+ ## License
163
+
164
+ Apache 2.0 - See LICENSE file for details
165
+
166
+
aoti.py ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ """
3
+
4
+ from typing import cast
5
+
6
+ import torch
7
+ from huggingface_hub import hf_hub_download
8
+ from spaces.zero.torch.aoti import ZeroGPUCompiledModel
9
+ from spaces.zero.torch.aoti import ZeroGPUWeights
10
+ from torch._functorch._aot_autograd.subclass_parametrization import unwrap_tensor_subclass_parameters
11
+
12
+
13
+ def _shallow_clone_module(module: torch.nn.Module) -> torch.nn.Module:
14
+ clone = object.__new__(module.__class__)
15
+ clone.__dict__ = module.__dict__.copy()
16
+ clone._parameters = module._parameters.copy()
17
+ clone._buffers = module._buffers.copy()
18
+ clone._modules = {k: _shallow_clone_module(v) for k, v in module._modules.items() if v is not None}
19
+ return clone
20
+
21
+
22
+ def aoti_blocks_load(module: torch.nn.Module, repo_id: str, variant: str | None = None):
23
+ repeated_blocks = cast(list[str], module._repeated_blocks)
24
+ aoti_files = {name: hf_hub_download(
25
+ repo_id=repo_id,
26
+ filename='package.pt2',
27
+ subfolder=name if variant is None else f'{name}.{variant}',
28
+ ) for name in repeated_blocks}
29
+ for block_name, aoti_file in aoti_files.items():
30
+ for block in module.modules():
31
+ if block.__class__.__name__ == block_name:
32
+ block_ = _shallow_clone_module(block)
33
+ unwrap_tensor_subclass_parameters(block_)
34
+ weights = ZeroGPUWeights(block_.state_dict())
35
+ block.forward = ZeroGPUCompiledModel(aoti_file, weights)
app.py ADDED
@@ -0,0 +1,263 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import spaces
3
+ import torch
4
+ from diffusers.pipelines.wan.pipeline_wan_i2v import WanImageToVideoPipeline
5
+ from diffusers.models.transformers.transformer_wan import WanTransformer3DModel
6
+ from diffusers.utils.export_utils import export_to_video
7
+ import gradio as gr
8
+ import tempfile
9
+ import numpy as np
10
+ from PIL import Image
11
+ import random
12
+ import gc
13
+
14
+ from torchao.quantization import quantize_
15
+ from torchao.quantization import Float8DynamicActivationFloat8WeightConfig, Int8WeightOnlyConfig
16
+ import aoti
17
+
18
+ # =========================================================
19
+ # MODEL CONFIGURATION
20
+ # =========================================================
21
+ MODEL_ID = "Wan-AI/Wan2.2-I2V-A14B-Diffusers"
22
+ HF_TOKEN = os.environ.get("HF_TOKEN")
23
+
24
+ MAX_DIM = 832
25
+ MIN_DIM = 480
26
+ SQUARE_DIM = 640
27
+ MULTIPLE_OF = 16
28
+
29
+ MAX_SEED = np.iinfo(np.int32).max
30
+
31
+ FIXED_FPS = 16
32
+ MIN_FRAMES_MODEL = 8
33
+ MAX_FRAMES_MODEL = 7720
34
+
35
+ MIN_DURATION = round(MIN_FRAMES_MODEL / FIXED_FPS, 1)
36
+ MAX_DURATION = round(MAX_FRAMES_MODEL / FIXED_FPS, 1)
37
+
38
+ # =========================================================
39
+ # LOAD PIPELINE
40
+ # =========================================================
41
+ pipe = WanImageToVideoPipeline.from_pretrained(
42
+ MODEL_ID,
43
+ transformer=WanTransformer3DModel.from_pretrained(
44
+ MODEL_ID,
45
+ subfolder="transformer",
46
+ torch_dtype=torch.bfloat16,
47
+ device_map="cuda",
48
+ token=HF_TOKEN
49
+ ),
50
+ transformer_2=WanTransformer3DModel.from_pretrained(
51
+ MODEL_ID,
52
+ subfolder="transformer_2",
53
+ torch_dtype=torch.bfloat16,
54
+ device_map="cuda",
55
+ token=HF_TOKEN
56
+ ),
57
+ torch_dtype=torch.bfloat16,
58
+ ).to("cuda")
59
+
60
+ # =========================================================
61
+ # LOAD LORA ADAPTERS
62
+ # =========================================================
63
+ pipe.load_lora_weights(
64
+ "Kijai/WanVideo_comfy",
65
+ weight_name="Lightx2v/lightx2v_I2V_14B_480p_cfg_step_distill_rank128_bf16.safetensors",
66
+ adapter_name="lightx2v"
67
+ )
68
+ pipe.load_lora_weights(
69
+ "obsxrver/wan2.2-i2v-scat",
70
+ weight_name="WAN2.2-I2V-HighNoise_scat-xxi-i2v.safetensors",
71
+ adapter_name="i2v_scat"
72
+ )
73
+ pipe.load_lora_weights(
74
+ "Kijai/WanVideo_comfy",
75
+ weight_name="Lightx2v/lightx2v_I2V_14B_480p_cfg_step_distill_rank128_bf16.safetensors",
76
+ adapter_name="lightx2v_2",
77
+ load_into_transformer_2=True
78
+ )
79
+ pipe.load_lora_weights(
80
+ "obsxrver/wan2.2-i2v-scat",
81
+ weight_name="WAN2.2-I2V-LowNoise_scat-xxi-i2v.safetensors",
82
+ adapter_name="i2v_scat_2"
83
+ )
84
+
85
+ pipe.set_adapters(["lightx2v", "i2v_scat","lightx2v_2","i2v_scat_2"], adapter_weights=[1., 1., 1., 1.])
86
+ pipe.fuse_lora(adapter_names=["lightx2v"], lora_scale=3., components=["transformer"])
87
+ pipe.fuse_lora(adapter_names=["i2v_scat"], lora_scale=1., components=["transformer"])
88
+ pipe.fuse_lora(adapter_names=["lightx2v_2"], lora_scale=1., components=["transformer_2"])
89
+ pipe.fuse_lora(adapter_names=["i2v_scat_2"], lora_scale=1., components=["transformer_2"])
90
+ pipe.unload_lora_weights()
91
+
92
+ # =========================================================
93
+ # QUANTIZATION & AOT OPTIMIZATION
94
+ # =========================================================
95
+ quantize_(pipe.text_encoder, Int8WeightOnlyConfig())
96
+ quantize_(pipe.transformer, Float8DynamicActivationFloat8WeightConfig())
97
+ quantize_(pipe.transformer_2, Float8DynamicActivationFloat8WeightConfig())
98
+
99
+ aoti.aoti_blocks_load(pipe.transformer, 'zerogpu-aoti/Wan2', variant='fp8da')
100
+ aoti.aoti_blocks_load(pipe.transformer_2, 'zerogpu-aoti/Wan2', variant='fp8da')
101
+
102
+ # =========================================================
103
+ # DEFAULT PROMPTS
104
+ # =========================================================
105
+ default_prompt_i2v = "the video cuts, in the next scene, she takes off her clothes and is nude and covered in feces, on her back with her with legs spread, looking at the camera, she defecates and rubs her pussy, no camera movement"
106
+ default_negative_prompt = (
107
+ "色调艳丽, 过曝, 静态, 细节模糊不清, 字幕, 风格, 作品, 画作, 画面, 静止, 整体发灰, "
108
+ "最差质量, 低质量, JPEG压缩残留, 丑陋的, 残缺的, 多余的手指, 画得不好的手部, 画得不好的脸部, "
109
+ "畸形的, 毁容的, 形态畸形的肢体, 手指融合, 静止不动的画面, 杂乱的背景, 三条腿, 背景人很多, 倒着走"
110
+ )
111
+
112
+ # =========================================================
113
+ # IMAGE RESIZING LOGIC
114
+ # =========================================================
115
+ def resize_image(image: Image.Image) -> Image.Image:
116
+ width, height = image.size
117
+ if width == height:
118
+ return image.resize((SQUARE_DIM, SQUARE_DIM), Image.LANCZOS)
119
+
120
+ aspect_ratio = width / height
121
+ MAX_ASPECT_RATIO = MAX_DIM / MIN_DIM
122
+ MIN_ASPECT_RATIO = MIN_DIM / MAX_DIM
123
+
124
+ image_to_resize = image
125
+
126
+ if aspect_ratio > MAX_ASPECT_RATIO:
127
+ crop_width = int(round(height * MAX_ASPECT_RATIO))
128
+ left = (width - crop_width) // 2
129
+ image_to_resize = image.crop((left, 0, left + crop_width, height))
130
+ elif aspect_ratio < MIN_ASPECT_RATIO:
131
+ crop_height = int(round(width / MIN_ASPECT_RATIO))
132
+ top = (height - crop_height) // 2
133
+ image_to_resize = image.crop((0, top, width, top + crop_height))
134
+
135
+ if width > height:
136
+ target_w = MAX_DIM
137
+ target_h = int(round(target_w / aspect_ratio))
138
+ else:
139
+ target_h = MAX_DIM
140
+ target_w = int(round(target_h * aspect_ratio))
141
+
142
+ final_w = round(target_w / MULTIPLE_OF) * MULTIPLE_OF
143
+ final_h = round(target_h / MULTIPLE_OF) * MULTIPLE_OF
144
+
145
+ final_w = max(MIN_DIM, min(MAX_DIM, final_w))
146
+ final_h = max(MIN_DIM, min(MAX_DIM, final_h))
147
+
148
+ return image_to_resize.resize((final_w, final_h), Image.LANCZOS)
149
+
150
+ # =========================================================
151
+ # UTILITY FUNCTIONS
152
+ # =========================================================
153
+ def get_num_frames(duration_seconds: float):
154
+ return 1 + int(np.clip(int(round(duration_seconds * FIXED_FPS)), MIN_FRAMES_MODEL, MAX_FRAMES_MODEL))
155
+
156
+ def get_duration(
157
+ input_image, prompt, steps, negative_prompt,
158
+ duration_seconds, guidance_scale, guidance_scale_2,
159
+ seed, randomize_seed, progress,
160
+ ):
161
+ BASE_FRAMES_HEIGHT_WIDTH = 81 * 832 * 624
162
+ BASE_STEP_DURATION = 15
163
+ width, height = resize_image(input_image).size
164
+ frames = get_num_frames(duration_seconds)
165
+ factor = frames * width * height / BASE_FRAMES_HEIGHT_WIDTH
166
+ step_duration = BASE_STEP_DURATION * factor ** 1.5
167
+ return 10 + int(steps) * step_duration
168
+
169
+ # =========================================================
170
+ # MAIN GENERATION FUNCTION
171
+ # =========================================================
172
+ @spaces.GPU(duration=get_duration)
173
+ def generate_video(
174
+ input_image,
175
+ prompt,
176
+ steps=4,
177
+ negative_prompt=default_negative_prompt,
178
+ duration_seconds=MAX_DURATION,
179
+ guidance_scale=1,
180
+ guidance_scale_2=1,
181
+ seed=42,
182
+ randomize_seed=False,
183
+ progress=gr.Progress(track_tqdm=True),
184
+ ):
185
+ if input_image is None:
186
+ raise gr.Error("Please upload an input image.")
187
+
188
+ num_frames = get_num_frames(duration_seconds)
189
+ current_seed = random.randint(0, MAX_SEED) if randomize_seed else int(seed)
190
+ resized_image = resize_image(input_image)
191
+
192
+ output_frames_list = pipe(
193
+ image=resized_image,
194
+ prompt=prompt,
195
+ negative_prompt=negative_prompt,
196
+ height=resized_image.height,
197
+ width=resized_image.width,
198
+ num_frames=num_frames,
199
+ guidance_scale=float(guidance_scale),
200
+ guidance_scale_2=float(guidance_scale_2),
201
+ num_inference_steps=int(steps),
202
+ generator=torch.Generator(device="cuda").manual_seed(current_seed),
203
+ ).frames[0]
204
+
205
+ with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as tmpfile:
206
+ video_path = tmpfile.name
207
+ export_to_video(output_frames_list, video_path, fps=FIXED_FPS)
208
+ return video_path, current_seed
209
+
210
+ # =========================================================
211
+ # GRADIO UI
212
+ # =========================================================
213
+ with gr.Blocks() as demo:
214
+ gr.Markdown("# Wan 2.2 I2V LoRA Demo")
215
+ gr.Markdown("Try it out ⚡")
216
+
217
+ with gr.Row():
218
+ with gr.Column():
219
+ input_image_component = gr.Image(type="pil", label="Input Image")
220
+ prompt_input = gr.Textbox(label="Prompt", value=default_prompt_i2v)
221
+ duration_seconds_input = gr.Slider(
222
+ minimum=MIN_DURATION, maximum=MAX_DURATION, step=0.1, value=3.5,
223
+ label="Duration (seconds)",
224
+ info=f"Model range: {MIN_FRAMES_MODEL}-{MAX_FRAMES_MODEL} frames at {FIXED_FPS}fps."
225
+ )
226
+
227
+ with gr.Accordion("Advanced Settings", open=False):
228
+ negative_prompt_input = gr.Textbox(label="Negative Prompt", value=default_negative_prompt, lines=3)
229
+ seed_input = gr.Slider(label="Seed", minimum=0, maximum=MAX_SEED, step=1, value=42)
230
+ randomize_seed_checkbox = gr.Checkbox(label="Randomize seed", value=True)
231
+ steps_slider = gr.Slider(minimum=1, maximum=30, step=1, value=6, label="Inference Steps")
232
+ guidance_scale_input = gr.Slider(minimum=0.0, maximum=10.0, step=0.5, value=1, label="Guidance Scale (high noise)")
233
+ guidance_scale_2_input = gr.Slider(minimum=0.0, maximum=10.0, step=0.5, value=1, label="Guidance Scale 2 (low noise)")
234
+
235
+ generate_button = gr.Button("🎬 Generate Video", variant="primary")
236
+
237
+ with gr.Column():
238
+ video_output = gr.Video(label="Generated Video", autoplay=True)
239
+
240
+ ui_inputs = [
241
+ input_image_component, prompt_input, steps_slider,
242
+ negative_prompt_input, duration_seconds_input,
243
+ guidance_scale_input, guidance_scale_2_input,
244
+ seed_input, randomize_seed_checkbox
245
+ ]
246
+ generate_button.click(fn=generate_video, inputs=ui_inputs, outputs=[video_output, seed_input])
247
+
248
+ gr.Examples(
249
+ examples=[
250
+ [
251
+ "wan_i2v_input.JPG",
252
+ "POV selfie video, white cat with sunglasses standing on surfboard, relaxed smile, tropical beach behind (clear water, green hills, blue sky with clouds). Surfboard tips, cat falls into ocean, camera plunges underwater with bubbles and sunlight beams. Brief underwater view of cat’s face, then cat resurfaces, still filming selfie, playful summer vacation mood.",
253
+ 4,
254
+ ],
255
+ ],
256
+ inputs=[input_image_component, prompt_input, steps_slider],
257
+ outputs=[video_output, seed_input],
258
+ fn=generate_video,
259
+ cache_examples="lazy"
260
+ )
261
+
262
+ if __name__ == "__main__":
263
+ demo.queue().launch(mcp_server=True)
example.py ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Hugging Face's logo
2
+ Hugging Face
3
+ Models
4
+ Datasets
5
+ Spaces
6
+ Community
7
+ Docs
8
+ Enterprise
9
+ Pricing
10
+
11
+
12
+ Spaces:
13
+ dream2589632147
14
+ /
15
+ Dream-wan2-2-faster-Pro
16
+
17
+
18
+ like
19
+ 232
20
+ App
21
+ Files
22
+ Community
23
+ 2
24
+ Dream-wan2-2-faster-Pro
25
+ /
26
+ app.py
27
+
28
+ dream2589632147's picture
29
+ dream2589632147
30
+ Update app.py
31
+ 5b9c736
32
+ verified
33
+ 5 days ago
34
+ raw
35
+
36
+ Copy download link
37
+ history
38
+ blame
39
+ contribute
40
+ delete
41
+
42
+ 10.2 kB
43
+ import os
44
+ import spaces
45
+ import torch
46
+ from diffusers.pipelines.wan.pipeline_wan_i2v import WanImageToVideoPipeline
47
+ from diffusers.models.transformers.transformer_wan import WanTransformer3DModel
48
+ from diffusers.utils.export_utils import export_to_video
49
+ import gradio as gr
50
+ import tempfile
51
+ import numpy as np
52
+ from PIL import Image
53
+ import random
54
+ import gc
55
+
56
+ from torchao.quantization import quantize_
57
+ from torchao.quantization import Float8DynamicActivationFloat8WeightConfig, Int8WeightOnlyConfig
58
+ import aoti
59
+
60
+ # =========================================================
61
+ # MODEL CONFIGURATION
62
+ # =========================================================
63
+ MODEL_ID = "Wan-AI/Wan2.2-I2V-A14B-Diffusers" # المسار الجديد للنموذج
64
+ HF_TOKEN = os.environ.get("HF_TOKEN") # ضع توكن Hugging Face هنا إذا كان النموذج خاصًا
65
+
66
+ MAX_DIM = 832
67
+ MIN_DIM = 480
68
+ SQUARE_DIM = 640
69
+ MULTIPLE_OF = 16
70
+
71
+ MAX_SEED = np.iinfo(np.int32).max
72
+
73
+ FIXED_FPS = 16
74
+ MIN_FRAMES_MODEL = 8
75
+ MAX_FRAMES_MODEL = 7720
76
+
77
+ MIN_DURATION = round(MIN_FRAMES_MODEL / FIXED_FPS, 1)
78
+ MAX_DURATION = round(MAX_FRAMES_MODEL / FIXED_FPS, 1)
79
+
80
+ # =========================================================
81
+ # LOAD PIPELINE
82
+ # =========================================================
83
+ pipe = WanImageToVideoPipeline.from_pretrained(
84
+ MODEL_ID,
85
+ transformer=WanTransformer3DModel.from_pretrained(
86
+ MODEL_ID,
87
+ subfolder="transformer",
88
+ torch_dtype=torch.bfloat16,
89
+ device_map="cuda",
90
+ token=HF_TOKEN
91
+ ),
92
+ transformer_2=WanTransformer3DModel.from_pretrained(
93
+ MODEL_ID,
94
+ subfolder="transformer_2",
95
+ torch_dtype=torch.bfloat16,
96
+ device_map="cuda",
97
+ token=HF_TOKEN
98
+ ),
99
+ torch_dtype=torch.bfloat16,
100
+ ).to("cuda")
101
+
102
+ # =========================================================
103
+ # LOAD LORA ADAPTERS
104
+ # =========================================================
105
+ pipe.load_lora_weights(
106
+ "Kijai/WanVideo_comfy",
107
+ weight_name="Lightx2v/lightx2v_I2V_14B_480p_cfg_step_distill_rank128_bf16.safetensors",
108
+ adapter_name="lightx2v"
109
+ )
110
+ pipe.load_lora_weights(
111
+ "Kijai/WanVideo_comfy",
112
+ weight_name="Lightx2v/lightx2v_I2V_14B_480p_cfg_step_distill_rank128_bf16.safetensors",
113
+ adapter_name="lightx2v_2",
114
+ load_into_transformer_2=True
115
+ )
116
+
117
+ pipe.set_adapters(["lightx2v", "lightx2v_2"], adapter_weights=[1., 1.])
118
+ pipe.fuse_lora(adapter_names=["lightx2v"], lora_scale=3., components=["transformer"])
119
+ pipe.fuse_lora(adapter_names=["lightx2v_2"], lora_scale=1., components=["transformer_2"])
120
+ pipe.unload_lora_weights()
121
+
122
+ # =========================================================
123
+ # QUANTIZATION & AOT OPTIMIZATION
124
+ # =========================================================
125
+ quantize_(pipe.text_encoder, Int8WeightOnlyConfig())
126
+ quantize_(pipe.transformer, Float8DynamicActivationFloat8WeightConfig())
127
+ quantize_(pipe.transformer_2, Float8DynamicActivationFloat8WeightConfig())
128
+
129
+ aoti.aoti_blocks_load(pipe.transformer, 'zerogpu-aoti/Wan2', variant='fp8da')
130
+ aoti.aoti_blocks_load(pipe.transformer_2, 'zerogpu-aoti/Wan2', variant='fp8da')
131
+
132
+ # =========================================================
133
+ # DEFAULT PROMPTS
134
+ # =========================================================
135
+ default_prompt_i2v = "make this image come alive, cinematic motion, smooth animation"
136
+ default_negative_prompt = (
137
+ "色调艳丽, 过曝, 静态, 细节模糊不清, 字幕, 风格, 作品, 画作, 画面, 静止, 整体发灰, "
138
+ "最差质量, 低质量, JPEG压缩残留, 丑陋的, 残缺的, 多余的手指, 画得不好的手部, 画得不好的脸部, "
139
+ "畸形的, 毁容的, 形态畸形的肢体, 手指融合, 静止不动的画面, 杂乱的背景, 三条腿, 背景人很多, 倒着走"
140
+ )
141
+
142
+ # =========================================================
143
+ # IMAGE RESIZING LOGIC
144
+ # =========================================================
145
+ def resize_image(image: Image.Image) -> Image.Image:
146
+ width, height = image.size
147
+ if width == height:
148
+ return image.resize((SQUARE_DIM, SQUARE_DIM), Image.LANCZOS)
149
+
150
+ aspect_ratio = width / height
151
+ MAX_ASPECT_RATIO = MAX_DIM / MIN_DIM
152
+ MIN_ASPECT_RATIO = MIN_DIM / MAX_DIM
153
+
154
+ image_to_resize = image
155
+
156
+ if aspect_ratio > MAX_ASPECT_RATIO:
157
+ crop_width = int(round(height * MAX_ASPECT_RATIO))
158
+ left = (width - crop_width) // 2
159
+ image_to_resize = image.crop((left, 0, left + crop_width, height))
160
+ elif aspect_ratio < MIN_ASPECT_RATIO:
161
+ crop_height = int(round(width / MIN_ASPECT_RATIO))
162
+ top = (height - crop_height) // 2
163
+ image_to_resize = image.crop((0, top, width, top + crop_height))
164
+
165
+ if width > height:
166
+ target_w = MAX_DIM
167
+ target_h = int(round(target_w / aspect_ratio))
168
+ else:
169
+ target_h = MAX_DIM
170
+ target_w = int(round(target_h * aspect_ratio))
171
+
172
+ final_w = round(target_w / MULTIPLE_OF) * MULTIPLE_OF
173
+ final_h = round(target_h / MULTIPLE_OF) * MULTIPLE_OF
174
+
175
+ final_w = max(MIN_DIM, min(MAX_DIM, final_w))
176
+ final_h = max(MIN_DIM, min(MAX_DIM, final_h))
177
+
178
+ return image_to_resize.resize((final_w, final_h), Image.LANCZOS)
179
+
180
+ # =========================================================
181
+ # UTILITY FUNCTIONS
182
+ # =========================================================
183
+ def get_num_frames(duration_seconds: float):
184
+ return 1 + int(np.clip(int(round(duration_seconds * FIXED_FPS)), MIN_FRAMES_MODEL, MAX_FRAMES_MODEL))
185
+
186
+ def get_duration(
187
+ input_image, prompt, steps, negative_prompt,
188
+ duration_seconds, guidance_scale, guidance_scale_2,
189
+ seed, randomize_seed, progress,
190
+ ):
191
+ BASE_FRAMES_HEIGHT_WIDTH = 81 * 832 * 624
192
+ BASE_STEP_DURATION = 15
193
+ width, height = resize_image(input_image).size
194
+ frames = get_num_frames(duration_seconds)
195
+ factor = frames * width * height / BASE_FRAMES_HEIGHT_WIDTH
196
+ step_duration = BASE_STEP_DURATION * factor ** 1.5
197
+ return 10 + int(steps) * step_duration
198
+
199
+ # =========================================================
200
+ # MAIN GENERATION FUNCTION
201
+ # =========================================================
202
+ @spaces.GPU(duration=get_duration)
203
+ def generate_video(
204
+ input_image,
205
+ prompt,
206
+ steps=4,
207
+ negative_prompt=default_negative_prompt,
208
+ duration_seconds=MAX_DURATION,
209
+ guidance_scale=1,
210
+ guidance_scale_2=1,
211
+ seed=42,
212
+ randomize_seed=False,
213
+ progress=gr.Progress(track_tqdm=True),
214
+ ):
215
+ if input_image is None:
216
+ raise gr.Error("Please upload an input image.")
217
+
218
+ num_frames = get_num_frames(duration_seconds)
219
+ current_seed = random.randint(0, MAX_SEED) if randomize_seed else int(seed)
220
+ resized_image = resize_image(input_image)
221
+
222
+ output_frames_list = pipe(
223
+ image=resized_image,
224
+ prompt=prompt,
225
+ negative_prompt=negative_prompt,
226
+ height=resized_image.height,
227
+ width=resized_image.width,
228
+ num_frames=num_frames,
229
+ guidance_scale=float(guidance_scale),
230
+ guidance_scale_2=float(guidance_scale_2),
231
+ num_inference_steps=int(steps),
232
+ generator=torch.Generator(device="cuda").manual_seed(current_seed),
233
+ ).frames[0]
234
+
235
+ with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as tmpfile:
236
+ video_path = tmpfile.name
237
+ export_to_video(output_frames_list, video_path, fps=FIXED_FPS)
238
+ return video_path, current_seed
239
+
240
+ # =========================================================
241
+ # GRADIO UI
242
+ # =========================================================
243
+ with gr.Blocks() as demo:
244
+ gr.Markdown("# 🚀 Dream Wan 2.2 Faster Pro (14B) — Ultra Fast I2V with Lightning LoRA")
245
+ gr.Markdown("Optimized FP8 quantized pipeline with AoT blocks & 4-step fast inference ⚡")
246
+
247
+ with gr.Row():
248
+ with gr.Column():
249
+ input_image_component = gr.Image(type="pil", label="Input Image")
250
+ prompt_input = gr.Textbox(label="Prompt", value=default_prompt_i2v)
251
+ duration_seconds_input = gr.Slider(
252
+ minimum=MIN_DURATION, maximum=MAX_DURATION, step=0.1, value=3.5,
253
+ label="Duration (seconds)",
254
+ info=f"Model range: {MIN_FRAMES_MODEL}-{MAX_FRAMES_MODEL} frames at {FIXED_FPS}fps."
255
+ )
256
+
257
+ with gr.Accordion("Advanced Settings", open=False):
258
+ negative_prompt_input = gr.Textbox(label="Negative Prompt", value=default_negative_prompt, lines=3)
259
+ seed_input = gr.Slider(label="Seed", minimum=0, maximum=MAX_SEED, step=1, value=42)
260
+ randomize_seed_checkbox = gr.Checkbox(label="Randomize seed", value=True)
261
+ steps_slider = gr.Slider(minimum=1, maximum=30, step=1, value=6, label="Inference Steps")
262
+ guidance_scale_input = gr.Slider(minimum=0.0, maximum=10.0, step=0.5, value=1, label="Guidance Scale (high noise)")
263
+ guidance_scale_2_input = gr.Slider(minimum=0.0, maximum=10.0, step=0.5, value=1, label="Guidance Scale 2 (low noise)")
264
+
265
+ generate_button = gr.Button("🎬 Generate Video", variant="primary")
266
+
267
+ with gr.Column():
268
+ video_output = gr.Video(label="Generated Video", autoplay=True)
269
+
270
+ ui_inputs = [
271
+ input_image_component, prompt_input, steps_slider,
272
+ negative_prompt_input, duration_seconds_input,
273
+ guidance_scale_input, guidance_scale_2_input,
274
+ seed_input, randomize_seed_checkbox
275
+ ]
276
+ generate_button.click(fn=generate_video, inputs=ui_inputs, outputs=[video_output, seed_input])
277
+
278
+ gr.Examples(
279
+ examples=[
280
+ [
281
+ "wan_i2v_input.JPG",
282
+ "POV selfie video, white cat with sunglasses standing on surfboard, relaxed smile, tropical beach behind (clear water, green hills, blue sky with clouds). Surfboard tips, cat falls into ocean, camera plunges underwater with bubbles and sunlight beams. Brief underwater view of cat’s face, then cat resurfaces, still filming selfie, playful summer vacation mood.",
283
+ 4,
284
+ ],
285
+ ],
286
+ inputs=[input_image_component, prompt_input, steps_slider],
287
+ outputs=[video_output, seed_input],
288
+ fn=generate_video,
289
+ cache_examples="lazy"
290
+ )
291
+
292
+ if __name__ == "__main__":
293
+ demo.queue().launch(mcp_server=True)
294
+
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ git+https://github.com/linoytsaban/diffusers.git@wan22-loras
2
+
3
+ transformers
4
+ accelerate
5
+ safetensors
6
+ sentencepiece
7
+ peft
8
+ ftfy
9
+ imageio-ffmpeg
10
+ opencv-python
11
+ torchao==0.11.0