wangkanai commited on
Commit
5ca336d
·
verified ·
1 Parent(s): 3fa7af1

Add files using upload-large-folder tool

Browse files
Files changed (1) hide show
  1. README.md +518 -0
README.md ADDED
@@ -0,0 +1,518 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: diffusers
4
+ pipeline_tag: image-to-video
5
+ tags:
6
+ - wan
7
+ - wan22
8
+ - image-to-video
9
+ - video-generation
10
+ - fp16
11
+ ---
12
+
13
+ <!-- README Version: v1.3 -->
14
+
15
+ # WAN 2.2 FP16 - Image-to-Video Models (Maximum Quality)
16
+
17
+ High-quality image-to-video (I2V) generation models in full FP16 precision for maximum quality video generation. This repository contains the core I2V diffusion models optimized for research-grade and archival quality video synthesis.
18
+
19
+ ## Model Description
20
+
21
+ WAN 2.2 FP16 is a 14-billion parameter video generation model based on diffusion architecture, providing full FP16 precision for maximum quality image-to-video generation. This repository contains the essential I2V diffusion models for high-end video generation workloads.
22
+
23
+ **Key Features**:
24
+ - 14B parameter diffusion-based architecture
25
+ - Full FP16 precision for maximum quality (27GB per model)
26
+ - Dedicated high-noise (creative) and low-noise (faithful) generation modes
27
+ - Image-to-video capabilities with cinematic quality output
28
+ - Optimized for research, archival quality, and final production renders
29
+
30
+ **Model Statistics**:
31
+ - **Total Repository Size**: ~54GB
32
+ - **Model Architecture**: Diffusion-based image-to-video generation
33
+ - **Format**: `.safetensors` (FP16)
34
+ - **Parameters**: 14 billion
35
+ - **Precision**: FP16 (full precision, no quantization)
36
+ - **Input**: Images + text prompts
37
+ - **Output**: Video sequences (typically 16-24 frames)
38
+
39
+ ## Repository Contents
40
+
41
+ ### Diffusion Models
42
+
43
+ Located in `diffusion_models/wan/`
44
+
45
+ | File | Size | Type | VRAM Required | Description |
46
+ |------|------|------|---------------|-------------|
47
+ | `wan22-i2v-14b-fp16-high.safetensors` | 27GB | FP16 I2V | 24GB+ | High-noise variant - Creative generation with higher variance |
48
+ | `wan22-i2v-14b-fp16-low.safetensors` | 27GB | FP16 I2V | 24GB+ | Low-noise variant - Faithful reproduction with consistent results |
49
+
50
+ **Total Size**: ~54GB
51
+
52
+ ## Hardware Requirements
53
+
54
+ ### Minimum Requirements
55
+
56
+ | Component | Requirement |
57
+ |-----------|-------------|
58
+ | **GPU VRAM** | 24GB minimum |
59
+ | **Recommended VRAM** | 32GB+ |
60
+ | **Disk Space** | 54GB free space |
61
+ | **System RAM** | 32GB+ recommended |
62
+ | **CUDA** | 11.8+ or 12.1+ |
63
+ | **PyTorch** | 2.0+ with FP16 support |
64
+
65
+ ### Compatible GPUs
66
+
67
+ **Minimum (24GB VRAM)**:
68
+ - NVIDIA RTX 4090 (24GB)
69
+ - NVIDIA RTX A5000 (24GB)
70
+ - NVIDIA RTX 6000 Ada (48GB)
71
+ - NVIDIA A6000 (48GB)
72
+
73
+ **Recommended (32GB+ VRAM)**:
74
+ - NVIDIA A100 (40GB/80GB)
75
+ - NVIDIA H100 (80GB)
76
+ - NVIDIA RTX 6000 Ada (48GB)
77
+ - Multi-GPU setups
78
+
79
+ **Not Compatible**:
80
+ - GPUs with less than 24GB VRAM (RTX 4080, RTX 3090, etc.)
81
+ - For lower VRAM requirements, see GGUF quantized variants in other repositories
82
+
83
+ ## Usage Examples
84
+
85
+ ### Basic Image-to-Video Generation
86
+
87
+ ```python
88
+ from diffusers import DiffusionPipeline
89
+ import torch
90
+ from PIL import Image
91
+
92
+ # Load input image
93
+ input_image = Image.open("path/to/your/image.jpg")
94
+
95
+ # Load I2V pipeline with FP16 precision
96
+ pipe = DiffusionPipeline.from_pretrained(
97
+ "path-to-base-wan22-model",
98
+ torch_dtype=torch.float16
99
+ )
100
+
101
+ # Load WAN 2.2 FP16 I2V model (high-noise variant for creative generation)
102
+ pipe.unet = torch.load(
103
+ "E:/huggingface/wan22-fp16-i2v/diffusion_models/wan/wan22-i2v-14b-fp16-high.safetensors"
104
+ )
105
+
106
+ pipe.to("cuda")
107
+
108
+ # Generate video from image
109
+ video = pipe(
110
+ image=input_image,
111
+ prompt="cinematic shot, high quality, detailed",
112
+ num_inference_steps=50,
113
+ num_frames=16
114
+ ).frames
115
+
116
+ # Save video
117
+ from diffusers.utils import export_to_video
118
+ export_to_video(video, "output.mp4", fps=8)
119
+ ```
120
+
121
+ ### Using Low-Noise Variant
122
+
123
+ ```python
124
+ # Load low-noise variant for more faithful reproduction
125
+ pipe.unet = torch.load(
126
+ "E:/huggingface/wan22-fp16-i2v/diffusion_models/wan/wan22-i2v-14b-fp16-low.safetensors"
127
+ )
128
+
129
+ # Generate video with consistent, faithful results
130
+ video = pipe(
131
+ image=input_image,
132
+ prompt="realistic scene, photographic quality",
133
+ num_inference_steps=50,
134
+ num_frames=16
135
+ ).frames
136
+ ```
137
+
138
+ ### Memory Optimization
139
+
140
+ ```python
141
+ # Enable CPU offloading if running into VRAM limits
142
+ pipe.enable_model_cpu_offload()
143
+
144
+ # Enable attention slicing for memory efficiency
145
+ pipe.enable_attention_slicing()
146
+
147
+ # For systems with 24GB VRAM, reduce frame count
148
+ video = pipe(
149
+ image=input_image,
150
+ prompt="your prompt",
151
+ num_inference_steps=50,
152
+ num_frames=12 # Reduced from 16 for memory efficiency
153
+ ).frames
154
+ ```
155
+
156
+ ## Model Specifications
157
+
158
+ ### Architecture Details
159
+
160
+ - **Model Type**: Diffusion transformer for image-to-video generation
161
+ - **Parameters**: 14 billion
162
+ - **Precision**: FP16 (IEEE 754 half-precision floating point)
163
+ - **Format**: SafeTensors (secure tensor serialization format)
164
+ - **Context Length**: Image conditioning + text prompt
165
+ - **Output Format**: Video frame sequences
166
+
167
+ ### Noise Schedule Variants
168
+
169
+ **High-Noise Model** (`wan22-i2v-14b-fp16-high.safetensors`):
170
+ - Greater noise variance during diffusion
171
+ - More creative interpretation of input
172
+ - Better for abstract, stylized, or artistic content
173
+ - Higher output variance across generations
174
+
175
+ **Low-Noise Model** (`wan22-i2v-14b-fp16-low.safetensors`):
176
+ - Lower noise variance during diffusion
177
+ - More faithful to input image and prompt
178
+ - Better for realistic, photographic content
179
+ - More consistent and predictable results
180
+
181
+ ## Performance Tips
182
+
183
+ ### Quality Optimization
184
+
185
+ 1. **FP16 Precision**: These models provide maximum quality with no quantization artifacts
186
+ 2. **Inference Steps**: Use 50-100 steps for best quality, 20-30 for rapid prototyping
187
+ 3. **Noise Variant Selection**:
188
+ - Use high-noise for creative, artistic outputs
189
+ - Use low-noise for realistic, consistent results
190
+ 4. **Prompt Engineering**: Detailed, specific prompts yield better results
191
+
192
+ ### Speed Optimization
193
+
194
+ 1. **Enable xFormers**: `pipe.enable_xformers_memory_efficient_attention()`
195
+ 2. **Reduce Inference Steps**: Start with 20-30 steps for testing
196
+ 3. **Optimize Frame Count**: Use 8-12 frames for faster generation
197
+ 4. **Batch Processing**: Generate multiple videos sequentially to amortize model loading
198
+
199
+ ### Memory Management
200
+
201
+ 1. **CPU Offloading**: `pipe.enable_model_cpu_offload()` for VRAM management
202
+ 2. **Attention Slicing**: `pipe.enable_attention_slicing()` for memory efficiency
203
+ 3. **Gradient Checkpointing**: Enable if fine-tuning
204
+ 4. **Clear Cache**: `torch.cuda.empty_cache()` between generations
205
+
206
+ ### GPU-Specific Tips
207
+
208
+ **RTX 4090 (24GB)**:
209
+ - Optimal performance with FP16 models
210
+ - Reduce frame count to 12-14 for stability
211
+ - Enable attention slicing for safety margin
212
+
213
+ **RTX 6000 Ada / A6000 (48GB)**:
214
+ - Full frame counts (16-24) without issues
215
+ - Can run batch processing or parallel pipelines
216
+ - Optimal for production workloads
217
+
218
+ **A100 / H100 (40GB-80GB)**:
219
+ - Maximum performance and flexibility
220
+ - Suitable for research and large-scale production
221
+ - Can handle extended frame sequences
222
+
223
+ ## Prompting Guidelines
224
+
225
+ ### Effective Prompt Structure
226
+
227
+ ```
228
+ [Style/Quality] [Subject/Scene] [Action/Motion] [Technical Details]
229
+ ```
230
+
231
+ ### Example Prompts
232
+
233
+ **Cinematic**:
234
+ - "cinematic shot, high quality, detailed lighting, professional cinematography"
235
+ - "film-like quality, dramatic shadows, cinematic color grading"
236
+
237
+ **Realistic**:
238
+ - "photorealistic, natural lighting, high detail, realistic motion"
239
+ - "documentary style, authentic atmosphere, lifelike movement"
240
+
241
+ **Artistic**:
242
+ - "stylized art, creative interpretation, abstract motion, artistic flair"
243
+ - "surreal atmosphere, dreamlike quality, artistic vision"
244
+
245
+ ### Prompt Tips
246
+
247
+ 1. **Be Specific**: Detailed prompts yield better results
248
+ 2. **Include Quality Terms**: "high quality", "detailed", "cinematic"
249
+ 3. **Describe Motion**: Specify desired movement or action
250
+ 4. **Lighting Description**: Mention lighting conditions for better results
251
+ 5. **Avoid Negatives**: Focus on what you want, not what you don't want
252
+
253
+ ## Intended Uses
254
+
255
+ ### Direct Use
256
+
257
+ WAN 2.2 FP16 is designed for:
258
+ - **Research**: Academic research in video generation and diffusion models
259
+ - **Archival Quality**: Maximum quality video generation for preservation
260
+ - **Final Production**: High-end content creation and professional video production
261
+ - **Quality Benchmarking**: Reference standard for video generation quality assessment
262
+
263
+ ### Downstream Use
264
+
265
+ - Fine-tuning on specialized datasets
266
+ - Quality baseline for model comparison
267
+ - Integration with high-end video production pipelines
268
+ - Training data generation for downstream tasks
269
+
270
+ ### Out-of-Scope Use
271
+
272
+ The model should **NOT** be used for:
273
+ - Generating deceptive, harmful, or misleading video content
274
+ - Creating deepfakes or non-consensual content of individuals
275
+ - Producing content that violates copyright or intellectual property rights
276
+ - Generating content intended to harass, abuse, or discriminate
277
+ - Creating videos for illegal purposes or activities
278
+ - Systems with insufficient VRAM (<24GB) - use quantized variants instead
279
+
280
+ ## Limitations and Considerations
281
+
282
+ ### Technical Limitations
283
+
284
+ **Hardware Constraints**:
285
+ - **Requires 24GB+ VRAM**: Not accessible on consumer GPUs below RTX 4090 tier
286
+ - **Large Model Size**: 27GB per model requires substantial disk space and loading time
287
+ - **Inference Speed**: FP16 precision trades speed for quality
288
+ - **Memory Intensive**: May require memory management techniques on 24GB systems
289
+
290
+ **Generation Quality**:
291
+ - **Temporal Consistency**: May produce flickering in complex motion sequences
292
+ - **Fine Details**: Small objects or intricate textures may lack perfect consistency
293
+ - **Physical Realism**: Generated physics may not always follow real-world rules
294
+ - **Text Rendering**: Cannot reliably render readable text within videos
295
+ - **Face Quality**: Faces may show artifacts (LoRAs can help but not included in this repo)
296
+
297
+ ### Content Limitations
298
+
299
+ - Training data biases may affect representation diversity
300
+ - May struggle with uncommon objects or rare scenarios
301
+ - Generated content may reflect biases present in training data
302
+ - No built-in content filtering or moderation
303
+
304
+ ## Risks and Mitigations
305
+
306
+ ### Misuse Risks
307
+
308
+ **Deepfakes and Misinformation**:
309
+ - Risk: Model could generate deceptive content
310
+ - Mitigation: Implement watermarking, content authentication, usage monitoring
311
+
312
+ **Copyright Infringement**:
313
+ - Risk: May generate content similar to copyrighted material
314
+ - Mitigation: Content filtering, responsible use guidelines
315
+
316
+ **Harmful Content**:
317
+ - Risk: Could generate disturbing or inappropriate content
318
+ - Mitigation: Safety filters, content moderation, ethical usage policies
319
+
320
+ ### Ethical Considerations
321
+
322
+ - Obtain appropriate permissions before generating videos of identifiable individuals
323
+ - Label AI-generated content clearly to prevent deception
324
+ - Consider environmental impact of compute-intensive inference
325
+ - Respect privacy, consent, and intellectual property rights
326
+
327
+ ### Recommendations
328
+
329
+ 1. Implement content moderation in production deployments
330
+ 2. Add visible/invisible watermarks to identify AI-generated content
331
+ 3. Provide clear disclaimers about AI generation
332
+ 4. Monitor for misuse and enforce usage policies
333
+ 5. Validate outputs for unintended biases before distribution
334
+ 6. Consider carbon offset for high-volume production use
335
+
336
+ ## Training Details
337
+
338
+ ### Training Data
339
+
340
+ Specific training data details are not publicly available. Typical video diffusion models of this scale are trained on:
341
+ - Large-scale video datasets with diverse content
342
+ - Text-video pairs for caption conditioning
343
+ - Image-video pairs for I2V tasks
344
+
345
+ **Note**: Contact original model authors for specific training dataset information.
346
+
347
+ ### Training Procedure
348
+
349
+ **Architecture**:
350
+ - Diffusion transformer with 14B parameters
351
+ - FP16 precision training
352
+ - Separate noise schedules for high-noise and low-noise variants
353
+
354
+ **Noise Schedules**:
355
+ - **High-noise**: Greater variance for creative generation
356
+ - **Low-noise**: Lower variance for faithful reproduction
357
+
358
+ ## Environmental Impact
359
+
360
+ Video generation models require significant computational resources.
361
+
362
+ ### Resource Consumption
363
+
364
+ - **Model Size**: 54GB total (two 27GB models)
365
+ - **Inference Power**: 350-450W per generation (high-end GPUs)
366
+ - **Training Impact**: Not disclosed (training carbon footprint unknown)
367
+ - **Inference Carbon**: Varies by energy source and usage patterns
368
+
369
+ ### Recommendations for Reducing Impact
370
+
371
+ 1. **Use Quantized Models**: Consider GGUF variants for efficiency (not in this repo)
372
+ 2. **Batch Processing**: Amortize overhead across multiple generations
373
+ 3. **Optimize Inference**: Use fewer steps for non-critical applications
374
+ 4. **Energy-Efficient Hardware**: Use modern GPUs with better performance-per-watt
375
+ 5. **Carbon Offset**: Consider offsetting for production deployments
376
+ 6. **On-Demand Usage**: Load models only when needed, unload after use
377
+
378
+ ## License
379
+
380
+ This repository uses the "other" license tag with license name "wan-license". Please check the original WAN 2.2 model repository for specific license terms, usage restrictions, and commercial use guidelines.
381
+
382
+ **Important**: Verify license compatibility before using in commercial or production applications.
383
+
384
+ ## Citation
385
+
386
+ If you use WAN 2.2 in your research or applications, please cite the original model:
387
+
388
+ ```bibtex
389
+ @misc{wan22,
390
+ title={WAN 2.2: Image-to-Video and Text-to-Video Generation},
391
+ author={WAN Team},
392
+ year={2024},
393
+ howpublished={Hugging Face Model Repository}
394
+ }
395
+ ```
396
+
397
+ ## Troubleshooting
398
+
399
+ ### Out of Memory Errors
400
+
401
+ **Problem**: CUDA out of memory during inference
402
+
403
+ **Solutions**:
404
+ 1. Enable CPU offloading: `pipe.enable_model_cpu_offload()`
405
+ 2. Enable attention slicing: `pipe.enable_attention_slicing()`
406
+ 3. Reduce frame count: Use 8-12 frames instead of 16
407
+ 4. Clear CUDA cache: `torch.cuda.empty_cache()`
408
+ 5. Use sequential CPU offload: `pipe.enable_sequential_cpu_offload()`
409
+ 6. Consider GGUF quantized models (available in other repositories)
410
+
411
+ **Note**: If errors persist with 24GB VRAM, these FP16 models may not be suitable for your hardware. Consider GGUF Q8 or Q4 variants.
412
+
413
+ ### Slow Generation Speed
414
+
415
+ **Problem**: Video generation takes too long
416
+
417
+ **Solutions**:
418
+ 1. Enable xFormers: `pipe.enable_xformers_memory_efficient_attention()`
419
+ 2. Reduce inference steps: Start with 20-30 steps
420
+ 3. Reduce frame count: Use 8-12 frames for faster generation
421
+ 4. Optimize CUDA: Ensure CUDA 12.1+ for best performance
422
+ 5. Consider GGUF Q4 models for faster inference (not in this repo)
423
+
424
+ ### Quality Issues
425
+
426
+ **Problem**: Generated videos lack quality or consistency
427
+
428
+ **Solutions**:
429
+ 1. **Try both noise variants**: Test high-noise and low-noise models
430
+ 2. **Increase inference steps**: Use 50-100 steps for best quality
431
+ 3. **Improve prompts**: Be more specific and detailed
432
+ 4. **Check model loading**: Ensure FP16 model loaded correctly
433
+ 5. **Verify input image**: High-quality input yields better output
434
+
435
+ **Note**: FP16 models provide maximum quality. If quality is still insufficient, issue may be prompt engineering or input image quality.
436
+
437
+ ### Model Loading Issues
438
+
439
+ **Problem**: Error loading SafeTensors files
440
+
441
+ **Solutions**:
442
+ 1. Verify file integrity: Check file size matches 27GB
443
+ 2. Ensure sufficient disk space: Need 27GB+ free space
444
+ 3. Update dependencies: `pip install --upgrade diffusers safetensors torch`
445
+ 4. Check PyTorch version: Requires PyTorch 2.0+ with FP16 support
446
+ 5. Verify CUDA installation: Ensure CUDA 11.8+ or 12.1+
447
+
448
+ ## Related Repositories
449
+
450
+ ### Other WAN 2.2 Repositories
451
+
452
+ - **wan22-fp8**: FP8 and GGUF quantized I2V + T2V models with LoRAs (~89GB)
453
+ - Includes text-to-video models
454
+ - Includes 10 enhancement LoRAs (camera control, lighting, etc.)
455
+ - 16GB VRAM requirement for FP8 models
456
+
457
+ ### Previous WAN Versions
458
+
459
+ - **wan21-fp16**: WAN 2.1 FP16 models (camera control v1, I2V only)
460
+ - **wan21-fp8**: WAN 2.1 FP8 models (camera control v1, I2V only)
461
+
462
+ ### Complementary Resources
463
+
464
+ For complete WAN 2.2 ecosystem:
465
+ - **VAE Models**: Available in wan22-fp8 repository
466
+ - **LoRA Adapters**: Available in wan22-fp8 repository (camera control, lighting, face enhancement)
467
+ - **Text-to-Video**: Available in wan22-fp8 repository
468
+
469
+ ## Model Card Information
470
+
471
+ **Model Card Authors**: Repository maintainer
472
+ **Model Card Contact**: Please open an issue in the repository
473
+ **Last Updated**: October 2024
474
+ **Model Version**: WAN 2.2 FP16 (v1.0)
475
+ **Repository Type**: Full Precision Model Weights
476
+
477
+ ## Support
478
+
479
+ For issues, questions, or contributions:
480
+ - Check the troubleshooting section above
481
+ - Refer to the main Hugging Face model repository
482
+ - Open an issue in this repository
483
+ - Consult the diffusers library documentation
484
+
485
+ ## Summary
486
+
487
+ **WAN 2.2 FP16 - Maximum Quality I2V Models**
488
+
489
+ This repository contains WAN 2.2 image-to-video models in full FP16 precision for maximum quality video generation:
490
+
491
+ - **2 Models**: High-noise and low-noise variants
492
+ - **54GB Total**: 27GB per model
493
+ - **FP16 Precision**: No quantization, maximum quality
494
+ - **24GB+ VRAM Required**: High-end GPUs only (RTX 4090, A5000, A6000+)
495
+ - **Research Grade**: Archival quality and final production renders
496
+ - **Image-to-Video Only**: For text-to-video and LoRAs, see wan22-fp8
497
+
498
+ **Recommended For**:
499
+ - Research and academic applications
500
+ - Archival quality video generation
501
+ - Final production renders
502
+ - Quality benchmarking and reference standards
503
+ - High-end video production workflows
504
+
505
+ **Not Recommended For**:
506
+ - Systems with <24GB VRAM (use GGUF quantized variants)
507
+ - Rapid prototyping (use GGUF Q4 variants)
508
+ - Budget or consumer GPUs (use FP8 or GGUF variants)
509
+
510
+ **Quality Hierarchy**: FP16 (this repo) > FP8 > GGUF Q8 > GGUF Q4
511
+
512
+ ---
513
+
514
+ **Repository Statistics**:
515
+ - **Total Size**: ~54GB
516
+ - **File Count**: 2 models
517
+ - **Format**: SafeTensors (FP16)
518
+ - **Primary Use Case**: Maximum quality I2V generation for research and production