--- language: [en] license: other library_name: motioncrafter tags: - motion - video - 4d - diffusion - scene-flow pipeline_tag: image-to-3d base_model: stabilityai/stable-video-diffusion-img2vid-xt ---

MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE

[Ruijie Zhu](https://ruijiezhu94.github.io/ruijiezhu/)1,2, [Jiahao Lu](https://scholar.google.com/citations?user=cRpteW4AAAAJ&hl=en)3, [Wenbo Hu](https://wbhu.github.io/)2, [Xiaoguang Han](https://scholar.google.com/citations?user=z-rqsR4AAAAJ&hl=en)4
[Jianfei Cai](https://jianfei-cai.github.io/)5, [Ying Shan](https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en)2, [Chuanxia Zheng](https://physicalvision.github.io/people/~chuanxia)1 1 NTU   2 ARC Lab, Tencent PCG   3 HKUST   4 CUHK(SZ)   5 Monash University [📄 Paper](https://arxiv.org/abs/2602.08961) | [🌐 Project Page](https://ruijiezhu94.github.io/MotionCrafter_Page/) | [💻 Code](https://github.com/TencentARC/MotionCrafter) | [📜 License](LICENSE.txt)
## Model Description MotionCrafter is a video diffusion-based framework that jointly reconstructs 4D geometry and estimates dense object motion from monocular videos. It predicts dense point maps and scene flow for each frame within a shared world coordinate system, without requiring post-optimization. ## Intended Use - Research on 4D reconstruction and motion estimation from monocular videos - Academic evaluation and benchmarking of dense point map and scene flow prediction Not intended for safety-critical or real-time production use. ## Limitations - Performance can degrade with extreme motion blur or severe occlusion. - Output quality is sensitive to input resolution and video quality. - Generalization may be limited for out-of-domain scenes. ## Training Data Training data details and preprocessing are described in the paper and main repository. If you need dataset specifics, please refer to the project page and the paper. ## Evaluation Please refer to the paper for evaluation datasets, metrics, and results. ## How to Use ```python import torch from motioncrafter import ( MotionCrafterDiffPipeline, MotionCrafterDetermPipeline, UnifyAutoencoderKL, UNetSpatioTemporalConditionModelVid2vid ) unet_path = "TencentARC/MotionCrafter" vae_path = "TencentARC/MotionCrafter" model_type = "determ" # or "diff" for diffusion version cache_dir = "./pretrained_models" unet = UNetSpatioTemporalConditionModelVid2vid.from_pretrained( unet_path, subfolder='unet_diff' if model_type == 'diff' else 'unet_determ', low_cpu_mem_usage=True, torch_dtype=torch.float16, cache_dir=cache_dir ).requires_grad_(False).to("cuda", dtype=torch.float16) geometry_motion_vae = UnifyAutoencoderKL.from_pretrained( vae_path, subfolder='geometry_motion_vae', low_cpu_mem_usage=True, torch_dtype=torch.float32, cache_dir=cache_dir ).requires_grad_(False).to("cuda", dtype=torch.float32) if model_type == 'diff': pipe = MotionCrafterDiffPipeline.from_pretrained( "stabilityai/stable-video-diffusion-img2vid-xt", unet=unet, torch_dtype=torch.float16, variant="fp16", cache_dir=cache_dir ).to("cuda") else: pipe = MotionCrafterDetermPipeline.from_pretrained( "stabilityai/stable-video-diffusion-img2vid-xt", unet=unet, torch_dtype=torch.float16, variant="fp16", cache_dir=cache_dir ).to("cuda") ``` ## Model Weights - geometry_motion_vae/: 4D VAE for joint geometry and motion representation - unet_determ/: deterministic UNet for motion prediction ## Model Variants - Deterministic (unet_determ): fast inference with fixed predictions per input - Diffusion (unet_diff): probabilistic predictions with diverse outputs ## Citation ```bibtex @article{zhu2025motioncrafter, title={MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE}, author={Zhu, Ruijie and Lu, Jiahao and Hu, Wenbo and Han, Xiaoguang and Cai, Jianfei and Shan, Ying and Zheng, Chuanxia}, journal={arXiv preprint arXiv:2602.08961}, year={2026} } ``` ## License This model is provided under the Tencent License. See [LICENSE.txt](LICENSE.txt) for details. ## Acknowledgments This work builds upon [GeometryCrafter](https://github.com/TencentARC/GeometryCrafter). We thank the authors for their excellent contributions.