MotionCrafter / README.md
RuijieZhu's picture
update arxiv id
817a1e6
---
language: [en]
license: other
library_name: motioncrafter
tags:
- motion
- video
- 4d
- diffusion
- scene-flow
pipeline_tag: image-to-3d
base_model: stabilityai/stable-video-diffusion-img2vid-xt
---
<h1 align="center" style="font-size: 1.6em;">MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE</h1>
<div align="center">
[Ruijie Zhu](https://ruijiezhu94.github.io/ruijiezhu/)<sup>1,2</sup>,
[Jiahao Lu](https://scholar.google.com/citations?user=cRpteW4AAAAJ&hl=en)<sup>3</sup>,
[Wenbo Hu](https://wbhu.github.io/)<sup>2</sup>,
[Xiaoguang Han](https://scholar.google.com/citations?user=z-rqsR4AAAAJ&hl=en)<sup>4</sup><br>
[Jianfei Cai](https://jianfei-cai.github.io/)<sup>5</sup>,
[Ying Shan](https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en)<sup>2</sup>,
[Chuanxia Zheng](https://physicalvision.github.io/people/~chuanxia)<sup>1</sup>
<sup>1</sup> NTU &nbsp; <sup>2</sup> ARC Lab, Tencent PCG &nbsp; <sup>3</sup> HKUST &nbsp; <sup>4</sup> CUHK(SZ) &nbsp; <sup>5</sup> Monash University
[๐Ÿ“„ Paper](https://arxiv.org/abs/2602.08961) | [๐ŸŒ Project Page](https://ruijiezhu94.github.io/MotionCrafter_Page/) | [๐Ÿ’ป Code](https://github.com/TencentARC/MotionCrafter) | [๐Ÿ“œ License](LICENSE.txt)
</div>
## Model Description
MotionCrafter is a video diffusion-based framework that jointly reconstructs 4D geometry and estimates dense object motion from monocular videos. It predicts dense point maps and scene flow for each frame within a shared world coordinate system, without requiring post-optimization.
## Intended Use
- Research on 4D reconstruction and motion estimation from monocular videos
- Academic evaluation and benchmarking of dense point map and scene flow prediction
Not intended for safety-critical or real-time production use.
## Limitations
- Performance can degrade with extreme motion blur or severe occlusion.
- Output quality is sensitive to input resolution and video quality.
- Generalization may be limited for out-of-domain scenes.
## Training Data
Training data details and preprocessing are described in the paper and main repository. If you need dataset specifics, please refer to the project page and the paper.
## Evaluation
Please refer to the paper for evaluation datasets, metrics, and results.
## How to Use
```python
import torch
from motioncrafter import (
MotionCrafterDiffPipeline,
MotionCrafterDetermPipeline,
UnifyAutoencoderKL,
UNetSpatioTemporalConditionModelVid2vid
)
unet_path = "TencentARC/MotionCrafter"
vae_path = "TencentARC/MotionCrafter"
model_type = "determ" # or "diff" for diffusion version
cache_dir = "./pretrained_models"
unet = UNetSpatioTemporalConditionModelVid2vid.from_pretrained(
unet_path,
subfolder='unet_diff' if model_type == 'diff' else 'unet_determ',
low_cpu_mem_usage=True,
torch_dtype=torch.float16,
cache_dir=cache_dir
).requires_grad_(False).to("cuda", dtype=torch.float16)
geometry_motion_vae = UnifyAutoencoderKL.from_pretrained(
vae_path,
subfolder='geometry_motion_vae',
low_cpu_mem_usage=True,
torch_dtype=torch.float32,
cache_dir=cache_dir
).requires_grad_(False).to("cuda", dtype=torch.float32)
if model_type == 'diff':
pipe = MotionCrafterDiffPipeline.from_pretrained(
"stabilityai/stable-video-diffusion-img2vid-xt",
unet=unet,
torch_dtype=torch.float16,
variant="fp16",
cache_dir=cache_dir
).to("cuda")
else:
pipe = MotionCrafterDetermPipeline.from_pretrained(
"stabilityai/stable-video-diffusion-img2vid-xt",
unet=unet,
torch_dtype=torch.float16,
variant="fp16",
cache_dir=cache_dir
).to("cuda")
```
## Model Weights
- geometry_motion_vae/: 4D VAE for joint geometry and motion representation
- unet_determ/: deterministic UNet for motion prediction
## Model Variants
- Deterministic (unet_determ): fast inference with fixed predictions per input
- Diffusion (unet_diff): probabilistic predictions with diverse outputs
## Citation
```bibtex
@article{zhu2025motioncrafter,
title={MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE},
author={Zhu, Ruijie and Lu, Jiahao and Hu, Wenbo and Han, Xiaoguang and Cai, Jianfei and Shan, Ying and Zheng, Chuanxia},
journal={arXiv preprint arXiv:2602.08961},
year={2026}
}
```
## License
This model is provided under the Tencent License. See [LICENSE.txt](LICENSE.txt) for details.
## Acknowledgments
This work builds upon [GeometryCrafter](https://github.com/TencentARC/GeometryCrafter). We thank the authors for their excellent contributions.