TencentARC
/

VerseCrafter

@@ -1,95 +1,94 @@
----
-license: other
-license_name: versecrafter-license
-license_link: LICENSE
-tags:
-  - video-generation
-  - image-to-video
-  - diffusion
-  - 4d-control
-  - camera-control
-  - object-motion
-  - world-model
-language:
-  - en
-base_model:
-  - Wan-AI/Wan2.1-T2V-14B
-pipeline_tag: image-to-video
----
-<p align="center">
-  <img src="assets/versecrafter.png" alt="VerseCrafter Logo" width="300">
-</p>
-<h2 align="center">
-  <a href="INSERT_ARXIV_LINK">
-    VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
-  </a>
-</h2>
-<a href="https://arxiv.org/pdf/2601.05138"><img src='https://img.shields.io/badge/arXiv-Paper-red?style=flat&logo=arXiv&logoColor=red' alt='arxiv'></a>&nbsp;
-<a href="https://github.com/TencentARC/VerseCrafter"><img src='https://img.shields.io/badge/GitHub-Code-blue?style=flat&logo=GitHub' alt='github'></a>&nbsp;
-<a href="https://huggingface.co/TencentARC/VerseCrafter"><img src='https://img.shields.io/badge/Hugging Face-ckpts-orange?style=flat&logo=HuggingFace&logoColor=orange' alt='huggingface'></a>&nbsp;
-<a href="https://sixiaozheng.github.io/VerseCrafter_page/"><img src='https://img.shields.io/badge/Project-Page-Green' alt='GitHub'></a>&nbsp;
-<p align="center">
-  <a href="https://sixiaozheng.github.io/">Sixiao Zheng</a><sup>1,2</sup> &nbsp;&nbsp;
-  <a href="#">Minghao Yin</a><sup>3</sup> &nbsp;&nbsp;
-  <a href="https://wbhu.github.io/">Wenbo Hu</a><sup>4†</sup> &nbsp;&nbsp;
-  <a href="https://xiaoyu258.github.io/">Xiaoyu Li</a><sup>4</sup> &nbsp;&nbsp;
-  <a href="https://www.linkedin.com/in/YingShanProfile">Ying Shan</a><sup>4</sup> &nbsp;&nbsp;
-  <a href="https://yanweifu.github.io/">Yanwei Fu</a><sup>1,2†</sup>
-</p>
-<p align="center">
-  <sup>1</sup>Fudan University &nbsp;&nbsp; <sup>2</sup>Shanghai Innovation Institute &nbsp;&nbsp; <sup>3</sup>HKU &nbsp;&nbsp; <sup>4</sup>ARC Lab, Tencent PCG
-</p>
-<p align="center">
-  <sup>†</sup>Corresponding authors
-</p>
-✨ A controllable video world model with explicit 4D geometric control over camera and multi-object motion.
-## TL;DR
-- **Dynamic Realistic Video World Model**: VerseCrafter learns a realistic and controllable video world prior from large-scale in-the-wild data, handling challenging dynamic scenes with strong spatial-temporal coherence.
-- **4D Geometric Control**: A unified 4D control state provides direct, interpretable control over camera motion, multi-object motion, and their joint coordination, improving geometric faithfulness.
-- **Frozen Video Prior + GeoAdapter**: We attach a geometry-aware GeoAdapter to a frozen Wan2.1 backbone, injecting 4D controls into diffusion blocks for precise control without sacrificing video quality.
-- **VerseControl4D Dataset**: We introduce a large-scale real-world dataset with automatically rendered camera trajectories and multi-object 3D Gaussian trajectories to supervise 4D controllable generation.
-## Model Details
-| Property | Value |
-|----------|-------|
-| **Base Model** | [Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) |
-| **Resolution** | 720 × 1280 |
-| **Frames** | 81 frames |
-| **Control Signals** | Camera trajectory + 3D Gaussian object trajectories |
-| **Architecture** | Frozen DiT + Trainable GeoAdapter |
-## Usage
-For installation, inference, and the complete pipeline (depth estimation, segmentation, 3D Gaussian fitting, trajectory customization in Blender, and video generation), please refer to our [GitHub repository](https://github.com/TencentARC/VerseCrafter).
-## Citation
-If you find this work useful, please consider citing:
-```bibtex
-@article{zheng2026versecrafter,
-  title={VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control},
-  author={Zheng, Sixiao and Yin, Minghao and Hu, Wenbo and Li, Xiaoyu and Shan, Ying and Fu, Yanwei},
-  journal={arXiv preprint arXiv:2601.05138},
-  year={2026}
-}
-```
-## Acknowledgements
-Our work is built upon [MoGe](https://github.com/microsoft/MoGe), [Grounded-SAM-2](https://github.com/IDEA-Research/Grounded-SAM-2), [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun), [Wan2.1](https://github.com/Wan-Video/Wan2.1) and [diffusers](https://github.com/huggingface/diffusers).
-## License
-This project is released under the [VerseCrafter License](LICENSE). It is intended for **academic/research purposes only** and commercial use is not permitted.

+---
+license: other
+license_name: versecrafter-license
+license_link: LICENSE
+tags:
+  - video-generation
+  - image-to-video
+  - diffusion
+  - 4d-control
+  - camera-control
+  - object-motion
+  - world-model
+language:
+  - en
+base_model:
+  - Wan-AI/Wan2.1-T2V-14B
+pipeline_tag: image-to-video
+extra_gated_eu_disallowed: true
+---
+<p align="center">
+  <img src="assets/versecrafter.png" alt="VerseCrafter Logo" width="300">
+</p>
+<h2 align="center">
+    VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
+</h2>
+<a href="https://arxiv.org/pdf/2601.05138"><img src='https://img.shields.io/badge/arXiv-Paper-red?style=flat&logo=arXiv&logoColor=red' alt='arxiv'></a>&nbsp;
+<a href="https://github.com/TencentARC/VerseCrafter"><img src='https://img.shields.io/badge/GitHub-Code-blue?style=flat&logo=GitHub' alt='github'></a>&nbsp;
+<a href="https://huggingface.co/TencentARC/VerseCrafter"><img src='https://img.shields.io/badge/Hugging Face-ckpts-orange?style=flat&logo=HuggingFace&logoColor=orange' alt='huggingface'></a>&nbsp;
+<a href="https://sixiaozheng.github.io/VerseCrafter_page/"><img src='https://img.shields.io/badge/Project-Page-Green' alt='GitHub'></a>&nbsp;
+<p align="center">
+  <a href="https://sixiaozheng.github.io/">Sixiao Zheng</a><sup>1,2</sup> &nbsp;&nbsp;
+  <a href="#">Minghao Yin</a><sup>3</sup> &nbsp;&nbsp;
+  <a href="https://wbhu.github.io/">Wenbo Hu</a><sup>4†</sup> &nbsp;&nbsp;
+  <a href="https://xiaoyu258.github.io/">Xiaoyu Li</a><sup>4</sup> &nbsp;&nbsp;
+  <a href="https://www.linkedin.com/in/YingShanProfile">Ying Shan</a><sup>4</sup> &nbsp;&nbsp;
+  <a href="https://yanweifu.github.io/">Yanwei Fu</a><sup>1,2†</sup>
+</p>
+<p align="center">
+  <sup>1</sup>Fudan University &nbsp;&nbsp; <sup>2</sup>Shanghai Innovation Institute &nbsp;&nbsp; <sup>3</sup>HKU &nbsp;&nbsp; <sup>4</sup>ARC Lab, Tencent PCG
+</p>
+<p align="center">
+  <sup>†</sup>Corresponding authors
+</p>
+✨ A controllable video world model with explicit 4D geometric control over camera and multi-object motion.
+## TL;DR
+- **Dynamic Realistic Video World Model**: VerseCrafter learns a realistic and controllable video world prior from large-scale in-the-wild data, handling challenging dynamic scenes with strong spatial-temporal coherence.
+- **4D Geometric Control**: A unified 4D control state provides direct, interpretable control over camera motion, multi-object motion, and their joint coordination, improving geometric faithfulness.
+- **Frozen Video Prior + GeoAdapter**: We attach a geometry-aware GeoAdapter to a frozen Wan2.1 backbone, injecting 4D controls into diffusion blocks for precise control without sacrificing video quality.
+- **VerseControl4D Dataset**: We introduce a large-scale real-world dataset with automatically rendered camera trajectories and multi-object 3D Gaussian trajectories to supervise 4D controllable generation.
+## Model Details
+| Property | Value |
+|----------|-------|
+| **Base Model** | [Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) |
+| **Resolution** | 720 × 1280 |
+| **Frames** | 81 frames |
+| **Control Signals** | Camera trajectory + 3D Gaussian object trajectories |
+| **Architecture** | Frozen DiT + Trainable GeoAdapter |
+## Usage
+For installation, inference, and the complete pipeline (depth estimation, segmentation, 3D Gaussian fitting, trajectory customization in Blender, and video generation), please refer to our [GitHub repository](https://github.com/TencentARC/VerseCrafter).
+## Citation
+If you find this work useful, please consider citing:
+```bibtex
+@article{zheng2026versecrafter,
+  title={VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control},
+  author={Zheng, Sixiao and Yin, Minghao and Hu, Wenbo and Li, Xiaoyu and Shan, Ying and Fu, Yanwei},
+  journal={arXiv preprint arXiv:2601.05138},
+  year={2026}
+}
+```
+## Acknowledgements
+Our work is built upon [MoGe](https://github.com/microsoft/MoGe), [Grounded-SAM-2](https://github.com/IDEA-Research/Grounded-SAM-2), [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun), [Wan2.1](https://github.com/Wan-Video/Wan2.1) and [diffusers](https://github.com/huggingface/diffusers).
+## License
+This project is released under the [VerseCrafter License](LICENSE). It is intended for **academic/research purposes only** and commercial use is not permitted.