Update README.md
Browse files
README.md
CHANGED
|
@@ -1,95 +1,94 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: other
|
| 3 |
-
license_name: versecrafter-license
|
| 4 |
-
license_link: LICENSE
|
| 5 |
-
tags:
|
| 6 |
-
- video-generation
|
| 7 |
-
- image-to-video
|
| 8 |
-
- diffusion
|
| 9 |
-
- 4d-control
|
| 10 |
-
- camera-control
|
| 11 |
-
- object-motion
|
| 12 |
-
- world-model
|
| 13 |
-
language:
|
| 14 |
-
- en
|
| 15 |
-
base_model:
|
| 16 |
-
- Wan-AI/Wan2.1-T2V-14B
|
| 17 |
-
pipeline_tag: image-to-video
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
<a href="https://
|
| 30 |
-
<a href="https://
|
| 31 |
-
<a href="https://
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
<
|
| 35 |
-
<a href="
|
| 36 |
-
<a href="
|
| 37 |
-
<a href="https://
|
| 38 |
-
<a href="https://
|
| 39 |
-
<a href="https://
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
<
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
<
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
- **
|
| 56 |
-
- **
|
| 57 |
-
- **
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
| **
|
| 66 |
-
| **
|
| 67 |
-
| **
|
| 68 |
-
| **
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
license_name: versecrafter-license
|
| 4 |
+
license_link: LICENSE
|
| 5 |
+
tags:
|
| 6 |
+
- video-generation
|
| 7 |
+
- image-to-video
|
| 8 |
+
- diffusion
|
| 9 |
+
- 4d-control
|
| 10 |
+
- camera-control
|
| 11 |
+
- object-motion
|
| 12 |
+
- world-model
|
| 13 |
+
language:
|
| 14 |
+
- en
|
| 15 |
+
base_model:
|
| 16 |
+
- Wan-AI/Wan2.1-T2V-14B
|
| 17 |
+
pipeline_tag: image-to-video
|
| 18 |
+
extra_gated_eu_disallowed: true
|
| 19 |
+
---
|
| 20 |
+
<p align="center">
|
| 21 |
+
<img src="assets/versecrafter.png" alt="VerseCrafter Logo" width="300">
|
| 22 |
+
</p>
|
| 23 |
+
|
| 24 |
+
<h2 align="center">
|
| 25 |
+
VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
|
| 26 |
+
</h2>
|
| 27 |
+
|
| 28 |
+
<a href="https://arxiv.org/pdf/2601.05138"><img src='https://img.shields.io/badge/arXiv-Paper-red?style=flat&logo=arXiv&logoColor=red' alt='arxiv'></a>
|
| 29 |
+
<a href="https://github.com/TencentARC/VerseCrafter"><img src='https://img.shields.io/badge/GitHub-Code-blue?style=flat&logo=GitHub' alt='github'></a>
|
| 30 |
+
<a href="https://huggingface.co/TencentARC/VerseCrafter"><img src='https://img.shields.io/badge/Hugging Face-ckpts-orange?style=flat&logo=HuggingFace&logoColor=orange' alt='huggingface'></a>
|
| 31 |
+
<a href="https://sixiaozheng.github.io/VerseCrafter_page/"><img src='https://img.shields.io/badge/Project-Page-Green' alt='GitHub'></a>
|
| 32 |
+
|
| 33 |
+
<p align="center">
|
| 34 |
+
<a href="https://sixiaozheng.github.io/">Sixiao Zheng</a><sup>1,2</sup>
|
| 35 |
+
<a href="#">Minghao Yin</a><sup>3</sup>
|
| 36 |
+
<a href="https://wbhu.github.io/">Wenbo Hu</a><sup>4†</sup>
|
| 37 |
+
<a href="https://xiaoyu258.github.io/">Xiaoyu Li</a><sup>4</sup>
|
| 38 |
+
<a href="https://www.linkedin.com/in/YingShanProfile">Ying Shan</a><sup>4</sup>
|
| 39 |
+
<a href="https://yanweifu.github.io/">Yanwei Fu</a><sup>1,2†</sup>
|
| 40 |
+
</p>
|
| 41 |
+
|
| 42 |
+
<p align="center">
|
| 43 |
+
<sup>1</sup>Fudan University <sup>2</sup>Shanghai Innovation Institute <sup>3</sup>HKU <sup>4</sup>ARC Lab, Tencent PCG
|
| 44 |
+
</p>
|
| 45 |
+
|
| 46 |
+
<p align="center">
|
| 47 |
+
<sup>†</sup>Corresponding authors
|
| 48 |
+
</p>
|
| 49 |
+
|
| 50 |
+
✨ A controllable video world model with explicit 4D geometric control over camera and multi-object motion.
|
| 51 |
+
|
| 52 |
+
## TL;DR
|
| 53 |
+
|
| 54 |
+
- **Dynamic Realistic Video World Model**: VerseCrafter learns a realistic and controllable video world prior from large-scale in-the-wild data, handling challenging dynamic scenes with strong spatial-temporal coherence.
|
| 55 |
+
- **4D Geometric Control**: A unified 4D control state provides direct, interpretable control over camera motion, multi-object motion, and their joint coordination, improving geometric faithfulness.
|
| 56 |
+
- **Frozen Video Prior + GeoAdapter**: We attach a geometry-aware GeoAdapter to a frozen Wan2.1 backbone, injecting 4D controls into diffusion blocks for precise control without sacrificing video quality.
|
| 57 |
+
- **VerseControl4D Dataset**: We introduce a large-scale real-world dataset with automatically rendered camera trajectories and multi-object 3D Gaussian trajectories to supervise 4D controllable generation.
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
## Model Details
|
| 61 |
+
|
| 62 |
+
| Property | Value |
|
| 63 |
+
|----------|-------|
|
| 64 |
+
| **Base Model** | [Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) |
|
| 65 |
+
| **Resolution** | 720 × 1280 |
|
| 66 |
+
| **Frames** | 81 frames |
|
| 67 |
+
| **Control Signals** | Camera trajectory + 3D Gaussian object trajectories |
|
| 68 |
+
| **Architecture** | Frozen DiT + Trainable GeoAdapter |
|
| 69 |
+
|
| 70 |
+
## Usage
|
| 71 |
+
|
| 72 |
+
For installation, inference, and the complete pipeline (depth estimation, segmentation, 3D Gaussian fitting, trajectory customization in Blender, and video generation), please refer to our [GitHub repository](https://github.com/TencentARC/VerseCrafter).
|
| 73 |
+
|
| 74 |
+
## Citation
|
| 75 |
+
|
| 76 |
+
If you find this work useful, please consider citing:
|
| 77 |
+
|
| 78 |
+
```bibtex
|
| 79 |
+
@article{zheng2026versecrafter,
|
| 80 |
+
title={VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control},
|
| 81 |
+
author={Zheng, Sixiao and Yin, Minghao and Hu, Wenbo and Li, Xiaoyu and Shan, Ying and Fu, Yanwei},
|
| 82 |
+
journal={arXiv preprint arXiv:2601.05138},
|
| 83 |
+
year={2026}
|
| 84 |
+
}
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
## Acknowledgements
|
| 88 |
+
|
| 89 |
+
Our work is built upon [MoGe](https://github.com/microsoft/MoGe), [Grounded-SAM-2](https://github.com/IDEA-Research/Grounded-SAM-2), [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun), [Wan2.1](https://github.com/Wan-Video/Wan2.1) and [diffusers](https://github.com/huggingface/diffusers).
|
| 90 |
+
|
| 91 |
+
## License
|
| 92 |
+
|
| 93 |
+
This project is released under the [VerseCrafter License](LICENSE). It is intended for **academic/research purposes only** and commercial use is not permitted.
|
| 94 |
+
|
|
|