Ruihang commited on
Commit
63a21fd
Β·
verified Β·
1 Parent(s): 803c3c0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +214 -6
README.md CHANGED
@@ -178,17 +178,225 @@ python generate.py \
178
  Looking forward to the Gradio launch soon to support everyone in freely creating their own videos.
179
 
180
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
181
  ## Citation
182
  If you find our work helpful, please cite us.
183
 
184
  ```
185
  @article{chu2025wan,
186
- title={Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance},
187
- author={Ruihang Chu and Yefei He and Zhekai Chen and Shiwei Zhang and Xiaogang Xu and Bin Xia and Dingdong Wang and Hongwei Yi and Xihui Liu and Hengshuang Zhao and Yu Liu and Yingya Zhang and Yujiu Yang},
188
- year={2025},
189
- eprint={2512.08765},
190
- archivePrefix={arXiv},
191
- primaryClass={cs.CV}
192
  }
193
  ```
194
 
 
178
  Looking forward to the Gradio launch soon to support everyone in freely creating their own videos.
179
 
180
 
181
+ ## Citation
182
+ If you find our work helpful, please cite us.
183
+
184
+ ```
185
+ # Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
186
+
187
+ [![Paper](https://img.shields.io/badge/ArXiv-Paper-brown)](https://arxiv.org/abs/2512.08765)
188
+ [![Code](https://img.shields.io/badge/GitHub-Code-blue)](https://github.com/ali-vilab/Wan-Move)
189
+ [![Model](https://img.shields.io/badge/HuggingFace-Model-yellow)](https://huggingface.co/Ruihang/Wan-Move-14B-480P)
190
+ [![Model](https://img.shields.io/badge/ModelScope-Model-violet)](https://modelscope.cn/models/churuihang/Wan-Move-14B-480P)
191
+ [![Model](https://img.shields.io/badge/HuggingFace-MoveBench-cyan)](https://huggingface.co/datasets/Ruihang/MoveBench)
192
+ [![Video](https://img.shields.io/badge/YouTube-Video-red)](https://www.youtube.com/watch?v=_5Cy7Z2NQJQ)
193
+ [![Website](https://img.shields.io/badge/Demo-Page-bron)](https://wan-move.github.io/)
194
+
195
+ <div align="center">
196
+
197
+ [![Watch the video](assets/video-first-frame.png)](https://www.youtube.com/watch?v=_5Cy7Z2NQJQ)
198
+
199
+ </div>
200
+
201
+ ## πŸ’‘ TLDR: Bring Wan I2V to SOTA fine-grained, point-level motion control!
202
+
203
+ **Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance [[Paper](https://arxiv.org/abs/2512.08765)]** <br />
204
+ [Ruihang Chu](https://scholar.google.com/citations?hl=zh-CN&user=62zPPxkAAAAJ), [Yefei He](https://hexy.tech/), [Zhekai Chen](https://scholar.google.com/citations?user=_eZWcIMAAAAJ), [Shiwei Zhang](https://scholar.google.com/citations?user=ZO3OQ-8AAAAJ), [Xiaogang Xu](https://xuxiaogang.com/), [Bin Xia](https://zj-binxia.github.io/), [Dingdong Wang](https://scholar.google.com/citations?user=hRWxWiEAAAAJ), [Hongwei Yi](https://scholar.google.com/citations?user=ocMf7fQAAAAJ), [Xihui Liu](https://xh-liu.github.io/), [Hengshuang Zhao](https://hszhao.github.io/), [Yu Liu](https://scholar.google.com/citations?user=8zksQb4AAAAJ), [Yingya Zhang](https://scholar.google.com/citations?user=16RDSEUAAAAJ), [Yujiu Yang](https://sites.google.com/view/iigroup-thu/about) <br />
205
+
206
+ We present our NeurIPS 2025 paper Wan-Move, a simple and scalable motion-control framework for video generation. Wan-Move offers the following key features:
207
+ - 🎯 **High-Quality 5s 480p Motion Control**: Through scaled training, Wan-Move can generate 5-second, 480p videos with SOTA motion controllability on par with commercial systems such as Kling 1.5 Pro’s Motion Brush, as verified via user studies.
208
+ - 🧩 **Novel latent Trajectory Guidance**: Our core idea is to represent the motion condition by propagating the first frame’s features along the trajectory, which can be seamlessly integrated into off-the-shelf image-to-video models (e.g., Wan-I2V-14B) without any architecture change or extra motion modules.
209
+
210
+ - πŸ•ΉοΈ **Fine-grained Point-level Control**: Object motions are represented with dense point trajectories, enabling precise, region-level control over how each element in the scene moves.
211
+
212
+ - πŸ“Š **Dedicated Motion-control Benchmark MoveBench**: MoveBench is a carefully curated benchmark with larger-scale samples, diverse content categories, longer video durations, and high-quality trajectory annotations.
213
+
214
+ πŸ™Œ We’re glad to see Wan-Move being tested in real-world videos by many creators and users.
215
+
216
+ ## πŸ”₯ Latest News!!
217
+
218
+ * Dec 15, 2025: πŸ‘‹ We've released a [local Gradio demo](#gradio-demo) for interactive trajectory drawing and video generation.
219
+ * Dec 10, 2025: πŸ‘‹ We've released the [inference code](#quickstart), [model weights](https://huggingface.co/Ruihang/Wan-Move-14B-480P), and [MoveBench](https://huggingface.co/datasets/Ruihang/MoveBench) of Wan-Move.
220
+ * Sep 18, 2025: πŸ‘‹ Wan-Move has been accepted by NeurIPS 2025! πŸŽ‰πŸŽ‰πŸŽ‰
221
+
222
+ ## Community Works
223
+ * **[ComfyUI]** Thank Kijai for integrating Wan-Move into the ComfyUI wrapper: [https://huggingface.co/Kijai/WanVideo_comfy_fp8_scaled/tree/main/WanMove](https://huggingface.co/Kijai/WanVideo_comfy_fp8_scaled/tree/main/WanMove)
224
+
225
+ * Thanks deepbeepmeep for supporting Wan-Move in Wan2GP, requiring low VRAM for video generation: https://github.com/deepbeepmeep/Wan2GP
226
+
227
+
228
+ ## πŸ“‘ Todo List
229
+ - Wan-Move-480P
230
+ - [x] Multi-GPU inference code of the 14B models
231
+ - [x] Checkpoints of the 14B models
232
+ - [x] Data and evaluation code of MoveBench
233
+ - [x] Gradio demo
234
+
235
+
236
+
237
+ ## Introduction of Wan-Move
238
+
239
+
240
+
241
+ <p align="center" style="border-radius: 10px">
242
+ <img src="assets/overview.png" width="100%" alt="logo"/>
243
+ <strong>Wan-Move spports diverse motion control applications in image-to-video generation. The generated samples (832Γ—480p, 5s) exhibits high visual fidelity and accurate motion.</strong>
244
+ </p>
245
+
246
+ <p align="center" style="border-radius: 10px">
247
+ <img src="assets/framework.png" width="100%" alt="logo"/>
248
+ <strong>The framework of Wan-Move. (a) How to inject motion guidance. (b) Training pipeline. </strong>
249
+ </p>
250
+
251
+ <p align="center" style="border-radius: 10px">
252
+ <img src="assets/movebench.png" width="100%" alt="logo"/>
253
+ <strong>The contruction pipeline and statistics of MoveBench. Welcome everyone to use it! </strong>
254
+ </p>
255
+
256
+ <p align="center" style="border-radius: 10px">
257
+ <img src="assets/main-comparison.png" width="100%" alt="logo"/>
258
+ <strong>Qualitative comparisons between Wan-Move and academic methods and commercial solutions. </strong>
259
+ </p>
260
+
261
+
262
+
263
+ ## Quickstart
264
+
265
+ #### Installation
266
+
267
+ > πŸ’‘Note: Wan-Move is implemented as a minimal extension on top of the [Wan2.1](https://github.com/Wan-Video/Wan2.1) codebase. If you have tried Wan2.1, you can reuse most of your existing setup with very low migration cost.
268
+
269
+ Clone the repo:
270
+ ```sh
271
+ git clone https://github.com/ali-vilab/Wan-Move.git
272
+ cd Wan-Move
273
+ ```
274
+
275
+ Install dependencies:
276
+ ```sh
277
+ # Ensure torch >= 2.4.0
278
+ pip install -r requirements.txt
279
+ ```
280
+
281
+
282
+ #### Model Download
283
+
284
+ | Models | Download Link | Notes |
285
+ |--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
286
+ | Wan-Move-14B-480P | πŸ€— [Huggingface](https://huggingface.co/Ruihang/Wan-Move-14B-480P) πŸ€– [ModelScope](https://modelscope.cn/models/churuihang/Wan-Move-14B-480P) | 5s 480P video generation
287
+
288
+
289
+ Download models using huggingface-cli:
290
+ ``` sh
291
+ pip install "huggingface_hub[cli]"
292
+ huggingface-cli download Ruihang/Wan-Move-14B-480P --local-dir ./Wan-Move-14B-480P
293
+ ```
294
+
295
+ Download models using modelscope-cli:
296
+ ``` sh
297
+ pip install modelscope
298
+ modelscope download churuihang/Wan-Move-14B-480P --local_dir ./Wan-Move-14B-480P
299
+ ```
300
+ #### Evaluation on MoveBench
301
+
302
+ Download MoveBench from Hugging Face
303
+ ``` sh
304
+ huggingface-cli download Ruihang/MoveBench --local-dir ./MoveBench --repo-type dataset
305
+ ```
306
+
307
+ > πŸ’‘Note:
308
+ > * MoveBench has provided the video captions. For a fair evaluation, you should turn off the [prompt extension](https://github.com/Wan-Video/Wan2.1?tab=readme-ov-file#2-using-prompt-extension-1) function developed in Wan2.1.
309
+ > * MoveBench provides both data in English and Chinese versions. You can select the language via the `--language` flag: use `en` for English and `zh` for Chinese.
310
+
311
+ - Single-GPU inference
312
+
313
+ ``` sh
314
+ # For single-object motion test, run:
315
+ python generate.py --task wan-move-i2v --size 480*832 --ckpt_dir ./Wan-Move-14B-480P --mode single --language en --save_path results/en --eval_bench
316
+
317
+ # For multi-object motion test, run:
318
+ python generate.py --task wan-move-i2v --size 480*832 --ckpt_dir ./Wan-Move-14B-480P --mode multi --language en --save_path results/en --eval_bench
319
+ ```
320
+
321
+ > πŸ’‘Note:
322
+ > * If you want to visualize the trajectory motion effect in our video demo, add the `--vis_track` flag. We also provide a separate visualization script, i.e., `scripts/visualize.py`, to support different visualization settings, for example, enabling mouse-button effects! 😊😊😊
323
+ > * If you encounter OOM (Out-of-Memory) issues, you can use the `--offload_model True` and `--t5_cpu` options to reduce GPU memory usage.
324
+ > * The 14B model can be run in a **single 40GB** GPU with `--t5_cpu --offload_model True --dtype bf16`! πŸ€—πŸ€—πŸ€—
325
+
326
+
327
+ - Multi-GPU inference
328
+
329
+ Following Wan2.1, Wan-Move also supports FSDP and [xDiT](https://github.com/xdit-project/xDiT) USP to accelerate inference. When running multi-GPU batch evaluation (e.g., evaluating MoveBench or a file containing multiple test cases), you should **disable** the [`Ulysses`](https://arxiv.org/abs/2309.14509) strategy by setting `--ulysses_size 1`. Ulysses is only supported when generating a single video with multi-GPU inference.
330
+
331
+ ``` sh
332
+ # For single-object motion test, run:
333
+ torchrun --nproc_per_node=8 generate.py --task wan-move-i2v --size 480*832 --ckpt_dir ./Wan-Move-14B-480P --mode single --language en --save_path results/en --eval_bench --dit_fsdp --t5_fsdp
334
+
335
+ # For multi-object motion test, run:
336
+ torchrun --nproc_per_node=8 generate.py --task wan-move-i2v --size 480*832 --ckpt_dir ./Wan-Move-14B-480P --mode multi --language en --save_path results/en --eval_bench --dit_fsdp --t5_fsdp
337
+ ```
338
+ After all results are generated, you can change the results storage path inside `MoveBench/bench.py`, then run:
339
+
340
+ ``` sh
341
+ python MoveBench/bench.py
342
+ ```
343
+
344
+ #### Run the Default Example
345
+
346
+ For single video generation, (not evaluating MoveBench), we also provide
347
+ a sample case in the `examples` folder. You can directly run:
348
+
349
+ ```sh
350
+ python generate.py \
351
+ --task wan-move-i2v \
352
+ --size 480*832 \
353
+ --ckpt_dir ./Wan-Move-14B-480P \
354
+ --image examples/example.jpg \
355
+ --track examples/example_tracks.npy \
356
+ --track_visibility examples/example_visibility.npy \
357
+ --prompt "A laptop is placed on a wooden table. The silver laptop is connected to a small grey external hard drive and transfers data through a white USB-C cable. The video is shot with a downward close-up lens." \
358
+ --save_file example.mp4
359
+ ```
360
+ #### Gradio Demo
361
+ We provide a local Gradio demo for interactive trajectory drawing and video generation.
362
+
363
+ 1. **Launch the Demo**:
364
+ ```bash
365
+ python gradio_app.py \
366
+ --task wan-move-i2v \
367
+ --size 480*832 \
368
+ --ckpt_dir ./Wan-Move-14B-480P \
369
+ --t5_cpu \
370
+ --offload_model True \
371
+ --dtype bf16 \
372
+ --port 7860 \
373
+ --share
374
+ ```
375
+
376
+ 2. **Features**:
377
+ * **Multi-Trajectory Control**: Draw multiple trajectories with distinct colors.
378
+ * **Speed Control**: Adjust the speed curve for each trajectory independently.
379
+ * **Real-time Preview**: Visualize your drawn trajectories on the input image and as a GIF.
380
+ * **Lazy Loading**: The model loads only when you start generation, ensuring fast startup.
381
+ * **History Gallery**: View your previously generated videos.
382
+
383
+ 3. **Usage**:
384
+ * Upload an image.
385
+ * Click on the image to add trajectory points.
386
+ * (Optional) Adjust the speed curve in the editor.
387
+ * Select "Create New..." in the dropdown to add more trajectories.
388
+ * Click "Generate Video".
389
+
390
+
391
  ## Citation
392
  If you find our work helpful, please cite us.
393
 
394
  ```
395
  @article{chu2025wan,
396
+ title={Wan-move: Motion-controllable video generation via latent trajectory guidance},
397
+ author={Chu, Ruihang and He, Yefei and Chen, Zhekai and Zhang, Shiwei and Xu, Xiaogang and Xia, Bin and Wang, Dingdong and Yi, Hongwei and Liu, Xihui and Zhao, Hengshuang and others},
398
+ journal={arXiv preprint arXiv:2512.08765},
399
+ year={2025}
 
 
400
  }
401
  ```
402