VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
Sixiao Zheng1,2 Minghao Yin3 Wenbo Hu4† Xiaoyu Li4 Ying Shan4 Yanwei Fu1,2†
1Fudan University 2Shanghai Innovation Institute 3HKU 4ARC Lab, Tencent PCG
†Corresponding authors
✨ A controllable video world model with explicit 4D geometric control over camera and multi-object motion.
TL;DR
- Dynamic Realistic Video World Model: VerseCrafter learns a realistic and controllable video world prior from large-scale in-the-wild data, handling challenging dynamic scenes with strong spatial-temporal coherence.
- 4D Geometric Control: A unified 4D control state provides direct, interpretable control over camera motion, multi-object motion, and their joint coordination, improving geometric faithfulness.
- Frozen Video Prior + GeoAdapter: We attach a geometry-aware GeoAdapter to a frozen Wan2.1 backbone, injecting 4D controls into diffusion blocks for precise control without sacrificing video quality.
- VerseControl4D Dataset: We introduce a large-scale real-world dataset with automatically rendered camera trajectories and multi-object 3D Gaussian trajectories to supervise 4D controllable generation.
Model Details
| Property | Value |
|---|---|
| Base Model | Wan2.1-T2V-14B |
| Resolution | 720 × 1280 |
| Frames | 81 frames |
| Control Signals | Camera trajectory + 3D Gaussian object trajectories |
| Architecture | Frozen DiT + Trainable GeoAdapter |
Usage
For installation, inference, and the complete pipeline (depth estimation, segmentation, 3D Gaussian fitting, trajectory customization in Blender, and video generation), please refer to our GitHub repository.
Citation
If you find this work useful, please consider citing:
@article{zheng2026versecrafter,
title={VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control},
author={Zheng, Sixiao and Yin, Minghao and Hu, Wenbo and Li, Xiaoyu and Shan, Ying and Fu, Yanwei},
journal={arXiv preprint arXiv:2601.05138},
year={2026}
}
Acknowledgements
Our work is built upon MoGe, Grounded-SAM-2, VideoX-Fun, Wan2.1 and diffusers.
License
This project is released under the VerseCrafter License. It is intended for academic/research purposes only and commercial use is not permitted.
- Downloads last month
- 57
Model tree for TencentARC/VerseCrafter
Base model
Wan-AI/Wan2.1-T2V-14B