SandwichZ commited on
Commit
ba98051
·
verified ·
1 Parent(s): 807994d

Delete README_en.md

Browse files
Files changed (1) hide show
  1. README_en.md +0 -352
README_en.md DELETED
@@ -1,352 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- - zh
6
- pipeline_tag: text-to-video
7
- library_name: diffusers
8
- tags:
9
- - video
10
- - video-generation
11
- ---
12
-
13
-
14
- # Wan-Fun
15
-
16
- 😊 Welcome!
17
-
18
- [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow)](https://huggingface.co/spaces/alibaba-pai/Wan2.1-Fun-1.3B-InP)
19
-
20
- [![Github](https://img.shields.io/badge/🎬%20Code-Github-blue)](https://github.com/aigc-apps/VideoX-Fun)
21
-
22
- [English](./README_en.md) | [简体中文](./README.md)
23
-
24
- # Table of Contents
25
- - [Table of Contents](#table-of-contents)
26
- - [Model zoo](#model-zoo)
27
- - [Video Result](#video-result)
28
- - [Quick Start](#quick-start)
29
- - [How to use](#how-to-use)
30
- - [Reference](#reference)
31
- - [License](#license)
32
-
33
- # Model zoo
34
-
35
- | Name | Storage Size | Hugging Face | Model Scope | Description |
36
- |--|--|--|--|--|
37
- | Wan2.2-Fun-A14B-InP | 64.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/Wan2.2-Fun-A14B-InP) | [😄Link](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-InP) | Wan2.2-Fun-14B text-to-video generation weights, trained at multiple resolutions, supports start-end image prediction. |
38
- | Wan2.2-Fun-A14B-Control | 64.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/Wan2.2-Fun-A14B-Control) | [😄Link](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control)| Wan2.2-Fun-14B video control weights, supporting various control conditions such as Canny, Depth, Pose, MLSD, etc., and trajectory control. Supports multi-resolution (512, 768, 1024) video prediction at 81 frames, trained at 16 frames per second, with multilingual prediction support. |
39
- | Wan2.2-Fun-A14B-Control-Camera | 64.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/Wan2.2-Fun-A14B-Control-Camera) | [😄Link](https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-Control-Camera)| Wan2.2-Fun-14B camera lens control weights. Supports multi-resolution (512, 768, 1024) video prediction, trained with 81 frames at 16 FPS, supports multilingual prediction. |
40
- | Wan2.2-Fun-5B-InP | 23.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/Wan2.2-Fun-5B-InP) | [😄Link](https://modelscope.cn/models/PAI/Wan2.2-Fun-5B-InP) | Wan2.2-Fun-5B text-to-video weights trained at 121 frames, 24 FPS, supporting first/last frame prediction. |
41
- | Wan2.2-Fun-5B-Control | 23.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/Wan2.2-Fun-5B-Control) | [😄Link](https://modelscope.cn/models/PAI/Wan2.2-Fun-5B-Control)| Wan2.2-Fun-5B video control weights, supporting control conditions like Canny, Depth, Pose, MLSD, and trajectory control. Trained at 121 frames, 24 FPS, with multilingual prediction support. |
42
- | Wan2.2-Fun-5B-Control-Camera | 23.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/Wan2.2-Fun-5B-Control-Camera) | [😄Link](https://modelscope.cn/models/PAI/Wan2.2-Fun-5B-Control-Camera)| Wan2.2-Fun-5B camera lens control weights. Trained at 121 frames, 24 FPS, with multilingual prediction support. |
43
-
44
- # Video Result
45
-
46
- ### Wan2.2-Fun-A14B-InP
47
-
48
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
49
- <tr>
50
- <td>
51
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_1.mp4" width="100%" controls autoplay loop></video>
52
- </td>
53
- <td>
54
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_2.mp4" width="100%" controls autoplay loop></video>
55
- </td>
56
- <td>
57
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_3.mp4" width="100%" controls autoplay loop></video>
58
- </td>
59
- <td>
60
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_4.mp4" width="100%" controls autoplay loop></video>
61
- </td>
62
- </tr>
63
- </table>
64
-
65
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
66
- <tr>
67
- <td>
68
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_5.mp4" width="100%" controls autoplay loop></video>
69
- </td>
70
- <td>
71
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_6.mp4" width="100%" controls autoplay loop></video>
72
- </td>
73
- <td>
74
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_7.mp4" width="100%" controls autoplay loop></video>
75
- </td>
76
- <td>
77
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/inp_8.mp4" width="100%" controls autoplay loop></video>
78
- </td>
79
- </tr>
80
- </table>
81
-
82
- ### Wan2.2-Fun-A14B-Control
83
-
84
- Generic Control Video + Reference Image:
85
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
86
- <tr>
87
- <td>
88
- Reference Image
89
- </td>
90
- <td>
91
- Control Video
92
- </td>
93
- <td>
94
- Wan2.2-Fun-14B-Control
95
- </td>
96
- <tr>
97
- <td>
98
- <image src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/8.png" width="100%" controls autoplay loop></image>
99
- </td>
100
- <td>
101
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/pose.mp4" width="100%" controls autoplay loop></video>
102
- </td>
103
- <td>
104
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/14b_ref.mp4" width="100%" controls autoplay loop></video>
105
- </td>
106
- <tr>
107
- </table>
108
-
109
- Generic Control Video (Canny, Pose, Depth, etc.) and Trajectory Control:
110
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
111
- <tr>
112
- <td>
113
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/guiji.mp4" width="100%" controls autoplay loop></video>
114
- </td>
115
- <td>
116
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/guiji_out.mp4" width="100%" controls autoplay loop></video>
117
- </td>
118
- <tr>
119
- </table>
120
-
121
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
122
- <tr>
123
- <td>
124
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/pose.mp4" width="100%" controls autoplay loop></video>
125
- </td>
126
- <td>
127
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/canny.mp4" width="100%" controls autoplay loop></video>
128
- </td>
129
- <td>
130
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/depth.mp4" width="100%" controls autoplay loop></video>
131
- </td>
132
- <tr>
133
- <td>
134
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/pose_out.mp4" width="100%" controls autoplay loop></video>
135
- </td>
136
- <td>
137
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/canny_out.mp4" width="100%" controls autoplay loop></video>
138
- </td>
139
- <td>
140
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/depth_out.mp4" width="100%" controls autoplay loop></video>
141
- </td>
142
- </tr>
143
- </table>
144
-
145
- ### Wan2.2-Fun-A14B-Control-Camera
146
-
147
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
148
- <tr>
149
- <td>
150
- Pan Up
151
- </td>
152
- <td>
153
- Pan Left
154
- </td>
155
- <td>
156
- Pan Right
157
- </td>
158
- <td>
159
- Zoom In
160
- </td>
161
- <tr>
162
- <td>
163
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/Pan_Up.mp4" width="100%" controls autoplay loop></video>
164
- </td>
165
- <td>
166
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/Pan_Left.mp4" width="100%" controls autoplay loop></video>
167
- </td>
168
- <td>
169
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/Pan_Right.mp4" width="100%" controls autoplay loop></video>
170
- </td>
171
- <td>
172
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/Zoom_In.mp4" width="100%" controls autoplay loop></video>
173
- </td>
174
- <tr>
175
- <td>
176
- Pan Down
177
- </td>
178
- <td>
179
- Pan Up + Pan Left
180
- </td>
181
- <td>
182
- Pan Up + Pan Right
183
- </td>
184
- <td>
185
- Zoom Out
186
- </td>
187
- <tr>
188
- <td>
189
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/Pan_Down.mp4" width="100%" controls autoplay loop></video>
190
- </td>
191
- <td>
192
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/Pan_UL.mp4" width="100%" controls autoplay loop></video>
193
- </td>
194
- <td>
195
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/Pan_UR.mp4" width="100%" controls autoplay loop></video>
196
- </td>
197
- <td>
198
- <video src="https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/wan_fun/asset_Wan2_2/v1.0/Zoom_Out.mp4" width="100%" controls autoplay loop></video>
199
- </td>
200
- </tr>
201
- </table>
202
-
203
- # Quick Start
204
- ### 1. Cloud usage: AliyunDSW/Docker
205
- #### a. From AliyunDSW
206
- DSW has free GPU time, which can be applied once by a user and is valid for 3 months after applying.
207
-
208
- Aliyun provide free GPU time in [Freetier](https://free.aliyun.com/?product=9602825&crowd=enterprise&spm=5176.28055625.J_5831864660.1.e939154aRgha4e&scm=20140722.M_9974135.P_110.MO_1806-ID_9974135-MID_9974135-CID_30683-ST_8512-V_1), get it and use in Aliyun PAI-DSW to start CogVideoX-Fun within 5min!
209
-
210
- [![DSW Notebook](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/dsw.png)](https://gallery.pai-ml.com/#/preview/deepLearning/cv/cogvideox_fun)
211
-
212
- #### b. From ComfyUI
213
- Our ComfyUI is as follows, please refer to [ComfyUI README](comfyui/README.md) for details.
214
- ![workflow graph](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/asset/v1/cogvideoxfunv1_workflow_i2v.jpg)
215
-
216
- #### c. From docker
217
- If you are using docker, please make sure that the graphics card driver and CUDA environment have been installed correctly in your machine.
218
-
219
- Then execute the following commands in this way:
220
- ```
221
- # pull image
222
- docker pull mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easycv/torch_cuda:cogvideox_fun
223
-
224
- # enter image
225
- docker run -it -p 7860:7860 --network host --gpus all --security-opt seccomp:unconfined --shm-size 200g mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easycv/torch_cuda:cogvideox_fun
226
-
227
- # clone code
228
- git clone https://github.com/aigc-apps/VideoX-Fun.git
229
-
230
- # enter VideoX-Fun's dir
231
- cd VideoX-Fun
232
-
233
- # download weights
234
- mkdir models/Diffusion_Transformer
235
- mkdir models/Personalized_Model
236
-
237
- # Please use the hugginface link or modelscope link to download the model.
238
- # CogVideoX-Fun
239
- # https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-InP
240
- # https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-InP
241
-
242
- # Wan
243
- # https://huggingface.co/alibaba-pai/Wan2.1-Fun-V1.1-14B-InP
244
- # https://modelscope.cn/models/PAI/Wan2.1-Fun-V1.1-14B-InP
245
- # https://huggingface.co/alibaba-pai/Wan2.2-Fun-A14B-InP
246
- # https://modelscope.cn/models/PAI/Wan2.2-Fun-A14B-InP
247
- ```
248
-
249
- ### 2. Local install: Environment Check/Downloading/Installation
250
- #### a. Environment Check
251
- We have verified this repo execution on the following environment:
252
-
253
- The detailed of Windows:
254
- - OS: Windows 10
255
- - python: python3.10 & python3.11
256
- - pytorch: torch2.2.0
257
- - CUDA: 11.8 & 12.1
258
- - CUDNN: 8+
259
- - GPU: Nvidia-3060 12G & Nvidia-3090 24G
260
-
261
- The detailed of Linux:
262
- - OS: Ubuntu 20.04, CentOS
263
- - python: python3.10 & python3.11
264
- - pytorch: torch2.2.0
265
- - CUDA: 11.8 & 12.1
266
- - CUDNN: 8+
267
- - GPU:Nvidia-V100 16G & Nvidia-A10 24G & Nvidia-A100 40G & Nvidia-A100 80G
268
-
269
- We need about 60GB available on disk (for saving weights), please check!
270
-
271
- #### b. Weights
272
- We'd better place the [weights](#model-zoo) along the specified path:
273
-
274
- **Via ComfyUI**:
275
- Put the models into the ComfyUI weights folder `ComfyUI/models/Fun_Models/`:
276
- ```
277
- 📦 ComfyUI/
278
- ├── 📂 models/
279
- │ └── 📂 Fun_Models/
280
- │ ├── 📂 CogVideoX-Fun-V1.1-2b-InP/
281
- │ ├── 📂 CogVideoX-Fun-V1.1-5b-InP/
282
- │ ├── 📂 Wan2.1-Fun-14B-InP
283
- │ └── 📂 Wan2.1-Fun-1.3B-InP/
284
- ```
285
-
286
- **Run its own python file or UI interface**:
287
- ```
288
- 📦 models/
289
- ├── 📂 Diffusion_Transformer/
290
- │ ├── 📂 CogVideoX-Fun-V1.1-2b-InP/
291
- │ ├── 📂 CogVideoX-Fun-V1.1-5b-InP/
292
- │ ├── 📂 Wan2.1-Fun-14B-InP
293
- │ └── 📂 Wan2.1-Fun-1.3B-InP/
294
- ├── 📂 Personalized_Model/
295
- │ └── your trained trainformer model / your trained lora model (for UI load)
296
- ```
297
-
298
- # How to Use
299
-
300
- <h3 id="video-gen">1. Generation</h3>
301
-
302
- #### a. GPU Memory Optimization
303
- Since Wan2.1 has a very large number of parameters, we need to consider memory optimization strategies to adapt to consumer-grade GPUs. We provide `GPU_memory_mode` for each prediction file, allowing you to choose between `model_cpu_offload`, `model_cpu_offload_and_qfloat8`, and `sequential_cpu_offload`. This solution is also applicable to CogVideoX-Fun generation.
304
-
305
- - `model_cpu_offload`: The entire model is moved to the CPU after use, saving some GPU memory.
306
- - `model_cpu_offload_and_qfloat8`: The entire model is moved to the CPU after use, and the transformer model is quantized to float8, saving more GPU memory.
307
- - `sequential_cpu_offload`: Each layer of the model is moved to the CPU after use. It is slower but saves a significant amount of GPU memory.
308
-
309
- `qfloat8` may slightly reduce model performance but saves more GPU memory. If you have sufficient GPU memory, it is recommended to use `model_cpu_offload`.
310
-
311
- #### b. Using ComfyUI
312
- For details, refer to [ComfyUI README](https://github.com/aigc-apps/VideoX-Fun/tree/main/comfyui)。
313
-
314
- #### c. Running Python Files
315
- - **Step 1**: Download the corresponding [weights](#model-zoo) and place them in the `models` folder.
316
- - **Step 2**: Use different files for prediction based on the weights and prediction goals. This library currently supports CogVideoX-Fun, Wan2.1, Wan2.1-Fun and Wan2.2. Different models are distinguished by folder names under the `examples` folder, and their supported features vary. Use them accordingly. Below is an example using CogVideoX-Fun:
317
- - **Text-to-Video**:
318
- - Modify `prompt`, `neg_prompt`, `guidance_scale`, and `seed` in the file `examples/cogvideox_fun/predict_t2v.py`.
319
- - Run the file `examples/cogvideox_fun/predict_t2v.py` and wait for the results. The generated videos will be saved in the folder `samples/cogvideox-fun-videos`.
320
- - **Image-to-Video**:
321
- - Modify `validation_image_start`, `validation_image_end`, `prompt`, `neg_prompt`, `guidance_scale`, and `seed` in the file `examples/cogvideox_fun/predict_i2v.py`.
322
- - `validation_image_start` is the starting image of the video, and `validation_image_end` is the ending image of the video.
323
- - Run the file `examples/cogvideox_fun/predict_i2v.py` and wait for the results. The generated videos will be saved in the folder `samples/cogvideox-fun-videos_i2v`.
324
- - **Video-to-Video**:
325
- - Modify `validation_video`, `validation_image_end`, `prompt`, `neg_prompt`, `guidance_scale`, and `seed` in the file `examples/cogvideox_fun/predict_v2v.py`.
326
- - `validation_video` is the reference video for video-to-video generation. You can use the following demo video: [Demo Video](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/asset/v1/play_guitar.mp4).
327
- - Run the file `examples/cogvideox_fun/predict_v2v.py` and wait for the results. The generated videos will be saved in the folder `samples/cogvideox-fun-videos_v2v`.
328
- - **Controlled Video Generation (Canny, Pose, Depth, etc.)**:
329
- - Modify `control_video`, `validation_image_end`, `prompt`, `neg_prompt`, `guidance_scale`, and `seed` in the file `examples/cogvideox_fun/predict_v2v_control.py`.
330
- - `control_video` is the control video extracted using operators such as Canny, Pose, or Depth. You can use the following demo video: [Demo Video](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/asset/v1.1/pose.mp4).
331
- - Run the file `examples/cogvideox_fun/predict_v2v_control.py` and wait for the results. The generated videos will be saved in the folder `samples/cogvideox-fun-videos_v2v_control`.
332
- - **Step 3**: If you want to integrate other backbones or Loras trained by yourself, modify `lora_path` and relevant paths in `examples/{model_name}/predict_t2v.py` or `examples/{model_name}/predict_i2v.py` as needed.
333
-
334
- #### d. Using the Web UI
335
- The web UI supports text-to-video, image-to-video, video-to-video, and controlled video generation (Canny, Pose, Depth, etc.). This library currently supports CogVideoX-Fun, Wan2.1, and Wan2.1-Fun. Different models are distinguished by folder names under the `examples` folder, and their supported features vary. Use them accordingly. Below is an example using CogVideoX-Fun:
336
-
337
- - **Step 1**: Download the corresponding [weights](#model-zoo) and place them in the `models` folder.
338
- - **Step 2**: Run the file `examples/cogvideox_fun/app.py` to access the Gradio interface.
339
- - **Step 3**: Select the generation model on the page, fill in `prompt`, `neg_prompt`, `guidance_scale`, and `seed`, click "Generate," and wait for the results. The generated videos will be saved in the `sample` folder.
340
-
341
- # Reference
342
- - CogVideo: https://github.com/THUDM/CogVideo/
343
- - EasyAnimate: https://github.com/aigc-apps/EasyAnimate
344
- - Wan2.1: https://github.com/Wan-Video/Wan2.1/
345
- - Wan2.1: https://github.com/Wan-Video/Wan2.2/
346
- - ComfyUI-KJNodes: https://github.com/kijai/ComfyUI-KJNodes
347
- - ComfyUI-EasyAnimateWrapper: https://github.com/kijai/ComfyUI-EasyAnimateWrapper
348
- - ComfyUI-CameraCtrl-Wrapper: https://github.com/chaojie/ComfyUI-CameraCtrl-Wrapper
349
- - CameraCtrl: https://github.com/hehao13/CameraCtrl
350
-
351
- # License
352
- This project is licensed under the [Apache License (Version 2.0)](https://github.com/modelscope/modelscope/blob/master/LICENSE).