sxzheng commited on
Commit
457bc79
·
verified ·
1 Parent(s): afb09a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -95
README.md CHANGED
@@ -1,95 +1,94 @@
1
- ---
2
- license: other
3
- license_name: versecrafter-license
4
- license_link: LICENSE
5
- tags:
6
- - video-generation
7
- - image-to-video
8
- - diffusion
9
- - 4d-control
10
- - camera-control
11
- - object-motion
12
- - world-model
13
- language:
14
- - en
15
- base_model:
16
- - Wan-AI/Wan2.1-T2V-14B
17
- pipeline_tag: image-to-video
18
- ---
19
- <p align="center">
20
- <img src="assets/versecrafter.png" alt="VerseCrafter Logo" width="300">
21
- </p>
22
-
23
- <h2 align="center">
24
- <a href="INSERT_ARXIV_LINK">
25
- VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
26
- </a>
27
- </h2>
28
-
29
- <a href="https://arxiv.org/pdf/2601.05138"><img src='https://img.shields.io/badge/arXiv-Paper-red?style=flat&logo=arXiv&logoColor=red' alt='arxiv'></a>&nbsp;
30
- <a href="https://github.com/TencentARC/VerseCrafter"><img src='https://img.shields.io/badge/GitHub-Code-blue?style=flat&logo=GitHub' alt='github'></a>&nbsp;
31
- <a href="https://huggingface.co/TencentARC/VerseCrafter"><img src='https://img.shields.io/badge/Hugging Face-ckpts-orange?style=flat&logo=HuggingFace&logoColor=orange' alt='huggingface'></a>&nbsp;
32
- <a href="https://sixiaozheng.github.io/VerseCrafter_page/"><img src='https://img.shields.io/badge/Project-Page-Green' alt='GitHub'></a>&nbsp;
33
-
34
- <p align="center">
35
- <a href="https://sixiaozheng.github.io/">Sixiao Zheng</a><sup>1,2</sup> &nbsp;&nbsp;
36
- <a href="#">Minghao Yin</a><sup>3</sup> &nbsp;&nbsp;
37
- <a href="https://wbhu.github.io/">Wenbo Hu</a><sup>4†</sup> &nbsp;&nbsp;
38
- <a href="https://xiaoyu258.github.io/">Xiaoyu Li</a><sup>4</sup> &nbsp;&nbsp;
39
- <a href="https://www.linkedin.com/in/YingShanProfile">Ying Shan</a><sup>4</sup> &nbsp;&nbsp;
40
- <a href="https://yanweifu.github.io/">Yanwei Fu</a><sup>1,2†</sup>
41
- </p>
42
-
43
- <p align="center">
44
- <sup>1</sup>Fudan University &nbsp;&nbsp; <sup>2</sup>Shanghai Innovation Institute &nbsp;&nbsp; <sup>3</sup>HKU &nbsp;&nbsp; <sup>4</sup>ARC Lab, Tencent PCG
45
- </p>
46
-
47
- <p align="center">
48
- <sup>†</sup>Corresponding authors
49
- </p>
50
-
51
- ✨ A controllable video world model with explicit 4D geometric control over camera and multi-object motion.
52
-
53
- ## TL;DR
54
-
55
- - **Dynamic Realistic Video World Model**: VerseCrafter learns a realistic and controllable video world prior from large-scale in-the-wild data, handling challenging dynamic scenes with strong spatial-temporal coherence.
56
- - **4D Geometric Control**: A unified 4D control state provides direct, interpretable control over camera motion, multi-object motion, and their joint coordination, improving geometric faithfulness.
57
- - **Frozen Video Prior + GeoAdapter**: We attach a geometry-aware GeoAdapter to a frozen Wan2.1 backbone, injecting 4D controls into diffusion blocks for precise control without sacrificing video quality.
58
- - **VerseControl4D Dataset**: We introduce a large-scale real-world dataset with automatically rendered camera trajectories and multi-object 3D Gaussian trajectories to supervise 4D controllable generation.
59
-
60
-
61
- ## Model Details
62
-
63
- | Property | Value |
64
- |----------|-------|
65
- | **Base Model** | [Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) |
66
- | **Resolution** | 720 × 1280 |
67
- | **Frames** | 81 frames |
68
- | **Control Signals** | Camera trajectory + 3D Gaussian object trajectories |
69
- | **Architecture** | Frozen DiT + Trainable GeoAdapter |
70
-
71
- ## Usage
72
-
73
- For installation, inference, and the complete pipeline (depth estimation, segmentation, 3D Gaussian fitting, trajectory customization in Blender, and video generation), please refer to our [GitHub repository](https://github.com/TencentARC/VerseCrafter).
74
-
75
- ## Citation
76
-
77
- If you find this work useful, please consider citing:
78
-
79
- ```bibtex
80
- @article{zheng2026versecrafter,
81
- title={VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control},
82
- author={Zheng, Sixiao and Yin, Minghao and Hu, Wenbo and Li, Xiaoyu and Shan, Ying and Fu, Yanwei},
83
- journal={arXiv preprint arXiv:2601.05138},
84
- year={2026}
85
- }
86
- ```
87
-
88
- ## Acknowledgements
89
-
90
- Our work is built upon [MoGe](https://github.com/microsoft/MoGe), [Grounded-SAM-2](https://github.com/IDEA-Research/Grounded-SAM-2), [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun), [Wan2.1](https://github.com/Wan-Video/Wan2.1) and [diffusers](https://github.com/huggingface/diffusers).
91
-
92
- ## License
93
-
94
- This project is released under the [VerseCrafter License](LICENSE). It is intended for **academic/research purposes only** and commercial use is not permitted.
95
-
 
1
+ ---
2
+ license: other
3
+ license_name: versecrafter-license
4
+ license_link: LICENSE
5
+ tags:
6
+ - video-generation
7
+ - image-to-video
8
+ - diffusion
9
+ - 4d-control
10
+ - camera-control
11
+ - object-motion
12
+ - world-model
13
+ language:
14
+ - en
15
+ base_model:
16
+ - Wan-AI/Wan2.1-T2V-14B
17
+ pipeline_tag: image-to-video
18
+ extra_gated_eu_disallowed: true
19
+ ---
20
+ <p align="center">
21
+ <img src="assets/versecrafter.png" alt="VerseCrafter Logo" width="300">
22
+ </p>
23
+
24
+ <h2 align="center">
25
+ VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
26
+ </h2>
27
+
28
+ <a href="https://arxiv.org/pdf/2601.05138"><img src='https://img.shields.io/badge/arXiv-Paper-red?style=flat&logo=arXiv&logoColor=red' alt='arxiv'></a>&nbsp;
29
+ <a href="https://github.com/TencentARC/VerseCrafter"><img src='https://img.shields.io/badge/GitHub-Code-blue?style=flat&logo=GitHub' alt='github'></a>&nbsp;
30
+ <a href="https://huggingface.co/TencentARC/VerseCrafter"><img src='https://img.shields.io/badge/Hugging Face-ckpts-orange?style=flat&logo=HuggingFace&logoColor=orange' alt='huggingface'></a>&nbsp;
31
+ <a href="https://sixiaozheng.github.io/VerseCrafter_page/"><img src='https://img.shields.io/badge/Project-Page-Green' alt='GitHub'></a>&nbsp;
32
+
33
+ <p align="center">
34
+ <a href="https://sixiaozheng.github.io/">Sixiao Zheng</a><sup>1,2</sup> &nbsp;&nbsp;
35
+ <a href="#">Minghao Yin</a><sup>3</sup> &nbsp;&nbsp;
36
+ <a href="https://wbhu.github.io/">Wenbo Hu</a><sup>4†</sup> &nbsp;&nbsp;
37
+ <a href="https://xiaoyu258.github.io/">Xiaoyu Li</a><sup>4</sup> &nbsp;&nbsp;
38
+ <a href="https://www.linkedin.com/in/YingShanProfile">Ying Shan</a><sup>4</sup> &nbsp;&nbsp;
39
+ <a href="https://yanweifu.github.io/">Yanwei Fu</a><sup>1,2†</sup>
40
+ </p>
41
+
42
+ <p align="center">
43
+ <sup>1</sup>Fudan University &nbsp;&nbsp; <sup>2</sup>Shanghai Innovation Institute &nbsp;&nbsp; <sup>3</sup>HKU &nbsp;&nbsp; <sup>4</sup>ARC Lab, Tencent PCG
44
+ </p>
45
+
46
+ <p align="center">
47
+ <sup>†</sup>Corresponding authors
48
+ </p>
49
+
50
+ ✨ A controllable video world model with explicit 4D geometric control over camera and multi-object motion.
51
+
52
+ ## TL;DR
53
+
54
+ - **Dynamic Realistic Video World Model**: VerseCrafter learns a realistic and controllable video world prior from large-scale in-the-wild data, handling challenging dynamic scenes with strong spatial-temporal coherence.
55
+ - **4D Geometric Control**: A unified 4D control state provides direct, interpretable control over camera motion, multi-object motion, and their joint coordination, improving geometric faithfulness.
56
+ - **Frozen Video Prior + GeoAdapter**: We attach a geometry-aware GeoAdapter to a frozen Wan2.1 backbone, injecting 4D controls into diffusion blocks for precise control without sacrificing video quality.
57
+ - **VerseControl4D Dataset**: We introduce a large-scale real-world dataset with automatically rendered camera trajectories and multi-object 3D Gaussian trajectories to supervise 4D controllable generation.
58
+
59
+
60
+ ## Model Details
61
+
62
+ | Property | Value |
63
+ |----------|-------|
64
+ | **Base Model** | [Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) |
65
+ | **Resolution** | 720 × 1280 |
66
+ | **Frames** | 81 frames |
67
+ | **Control Signals** | Camera trajectory + 3D Gaussian object trajectories |
68
+ | **Architecture** | Frozen DiT + Trainable GeoAdapter |
69
+
70
+ ## Usage
71
+
72
+ For installation, inference, and the complete pipeline (depth estimation, segmentation, 3D Gaussian fitting, trajectory customization in Blender, and video generation), please refer to our [GitHub repository](https://github.com/TencentARC/VerseCrafter).
73
+
74
+ ## Citation
75
+
76
+ If you find this work useful, please consider citing:
77
+
78
+ ```bibtex
79
+ @article{zheng2026versecrafter,
80
+ title={VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control},
81
+ author={Zheng, Sixiao and Yin, Minghao and Hu, Wenbo and Li, Xiaoyu and Shan, Ying and Fu, Yanwei},
82
+ journal={arXiv preprint arXiv:2601.05138},
83
+ year={2026}
84
+ }
85
+ ```
86
+
87
+ ## Acknowledgements
88
+
89
+ Our work is built upon [MoGe](https://github.com/microsoft/MoGe), [Grounded-SAM-2](https://github.com/IDEA-Research/Grounded-SAM-2), [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun), [Wan2.1](https://github.com/Wan-Video/Wan2.1) and [diffusers](https://github.com/huggingface/diffusers).
90
+
91
+ ## License
92
+
93
+ This project is released under the [VerseCrafter License](LICENSE). It is intended for **academic/research purposes only** and commercial use is not permitted.
94
+