VerseCrafter / README.md
sxzheng
update readme link
afb09a2
---
license: other
license_name: versecrafter-license
license_link: LICENSE
tags:
- video-generation
- image-to-video
- diffusion
- 4d-control
- camera-control
- object-motion
- world-model
language:
- en
base_model:
- Wan-AI/Wan2.1-T2V-14B
pipeline_tag: image-to-video
---
<p align="center">
<img src="assets/versecrafter.png" alt="VerseCrafter Logo" width="300">
</p>
<h2 align="center">
<a href="INSERT_ARXIV_LINK">
VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
</a>
</h2>
<a href="https://arxiv.org/pdf/2601.05138"><img src='https://img.shields.io/badge/arXiv-Paper-red?style=flat&logo=arXiv&logoColor=red' alt='arxiv'></a>&nbsp;
<a href="https://github.com/TencentARC/VerseCrafter"><img src='https://img.shields.io/badge/GitHub-Code-blue?style=flat&logo=GitHub' alt='github'></a>&nbsp;
<a href="https://huggingface.co/TencentARC/VerseCrafter"><img src='https://img.shields.io/badge/Hugging Face-ckpts-orange?style=flat&logo=HuggingFace&logoColor=orange' alt='huggingface'></a>&nbsp;
<a href="https://sixiaozheng.github.io/VerseCrafter_page/"><img src='https://img.shields.io/badge/Project-Page-Green' alt='GitHub'></a>&nbsp;
<p align="center">
<a href="https://sixiaozheng.github.io/">Sixiao Zheng</a><sup>1,2</sup> &nbsp;&nbsp;
<a href="#">Minghao Yin</a><sup>3</sup> &nbsp;&nbsp;
<a href="https://wbhu.github.io/">Wenbo Hu</a><sup>4†</sup> &nbsp;&nbsp;
<a href="https://xiaoyu258.github.io/">Xiaoyu Li</a><sup>4</sup> &nbsp;&nbsp;
<a href="https://www.linkedin.com/in/YingShanProfile">Ying Shan</a><sup>4</sup> &nbsp;&nbsp;
<a href="https://yanweifu.github.io/">Yanwei Fu</a><sup>1,2†</sup>
</p>
<p align="center">
<sup>1</sup>Fudan University &nbsp;&nbsp; <sup>2</sup>Shanghai Innovation Institute &nbsp;&nbsp; <sup>3</sup>HKU &nbsp;&nbsp; <sup>4</sup>ARC Lab, Tencent PCG
</p>
<p align="center">
<sup></sup>Corresponding authors
</p>
✨ A controllable video world model with explicit 4D geometric control over camera and multi-object motion.
## TL;DR
- **Dynamic Realistic Video World Model**: VerseCrafter learns a realistic and controllable video world prior from large-scale in-the-wild data, handling challenging dynamic scenes with strong spatial-temporal coherence.
- **4D Geometric Control**: A unified 4D control state provides direct, interpretable control over camera motion, multi-object motion, and their joint coordination, improving geometric faithfulness.
- **Frozen Video Prior + GeoAdapter**: We attach a geometry-aware GeoAdapter to a frozen Wan2.1 backbone, injecting 4D controls into diffusion blocks for precise control without sacrificing video quality.
- **VerseControl4D Dataset**: We introduce a large-scale real-world dataset with automatically rendered camera trajectories and multi-object 3D Gaussian trajectories to supervise 4D controllable generation.
## Model Details
| Property | Value |
|----------|-------|
| **Base Model** | [Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) |
| **Resolution** | 720 × 1280 |
| **Frames** | 81 frames |
| **Control Signals** | Camera trajectory + 3D Gaussian object trajectories |
| **Architecture** | Frozen DiT + Trainable GeoAdapter |
## Usage
For installation, inference, and the complete pipeline (depth estimation, segmentation, 3D Gaussian fitting, trajectory customization in Blender, and video generation), please refer to our [GitHub repository](https://github.com/TencentARC/VerseCrafter).
## Citation
If you find this work useful, please consider citing:
```bibtex
@article{zheng2026versecrafter,
title={VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control},
author={Zheng, Sixiao and Yin, Minghao and Hu, Wenbo and Li, Xiaoyu and Shan, Ying and Fu, Yanwei},
journal={arXiv preprint arXiv:2601.05138},
year={2026}
}
```
## Acknowledgements
Our work is built upon [MoGe](https://github.com/microsoft/MoGe), [Grounded-SAM-2](https://github.com/IDEA-Research/Grounded-SAM-2), [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun), [Wan2.1](https://github.com/Wan-Video/Wan2.1) and [diffusers](https://github.com/huggingface/diffusers).
## License
This project is released under the [VerseCrafter License](LICENSE). It is intended for **academic/research purposes only** and commercial use is not permitted.