MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE

---
language: [en]
license: other
library_name: motioncrafter
tags:
- motion
- video
- 4d
- diffusion
- scene-flow
pipeline_tag: image-to-3d
base_model: stabilityai/stable-video-diffusion-img2vid-xt
---

<h1 align="center" style="font-size: 1.6em;">MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE</h1>

<div align="center">

[Ruijie Zhu](https://ruijiezhu94.github.io/ruijiezhu/)<sup>1,2</sup>,
[Jiahao Lu](https://scholar.google.com/citations?user=cRpteW4AAAAJ&hl=en)<sup>3</sup>,
[Wenbo Hu](https://wbhu.github.io/)<sup>2</sup>,
[Xiaoguang Han](https://scholar.google.com/citations?user=z-rqsR4AAAAJ&hl=en)<sup>4</sup><br>
[Jianfei Cai](https://jianfei-cai.github.io/)<sup>5</sup>,
[Ying Shan](https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en)<sup>2</sup>,
[Chuanxia Zheng](https://physicalvision.github.io/people/~chuanxia)<sup>1</sup>

<sup>1</sup> NTU &nbsp; <sup>2</sup> ARC Lab, Tencent PCG &nbsp; <sup>3</sup> HKUST &nbsp; <sup>4</sup> CUHK(SZ) &nbsp; <sup>5</sup> Monash University

[📄 Paper](https://arxiv.org/abs/2602.08961) | [🌐 Project Page](https://ruijiezhu94.github.io/MotionCrafter_Page/) | [💻 Code](https://github.com/TencentARC/MotionCrafter) | [📜 License](LICENSE.txt)

</div>

## Model Description

MotionCrafter is a video diffusion-based framework that jointly reconstructs 4D geometry and estimates dense object motion from monocular videos. It predicts dense point maps and scene flow for each frame within a shared world coordinate system, without requiring post-optimization.

## Intended Use

- Research on 4D reconstruction and motion estimation from monocular videos
- Academic evaluation and benchmarking of dense point map and scene flow prediction

Not intended for safety-critical or real-time production use.

## Limitations

- Performance can degrade with extreme motion blur or severe occlusion.
- Output quality is sensitive to input resolution and video quality.
- Generalization may be limited for out-of-domain scenes.

## Training Data

Training data details and preprocessing are described in the paper and main repository. If you need dataset specifics, please refer to the project page and the paper.

## Evaluation

Please refer to the paper for evaluation datasets, metrics, and results.

## How to Use

```python
import torch
from motioncrafter import (
    MotionCrafterDiffPipeline,
    MotionCrafterDetermPipeline,
    UnifyAutoencoderKL,
    UNetSpatioTemporalConditionModelVid2vid
)

unet_path = "TencentARC/MotionCrafter"
vae_path = "TencentARC/MotionCrafter"
model_type = "determ"  # or "diff" for diffusion version
cache_dir = "./pretrained_models"

unet = UNetSpatioTemporalConditionModelVid2vid.from_pretrained(
    unet_path,
    subfolder='unet_diff' if model_type == 'diff' else 'unet_determ',
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    cache_dir=cache_dir
).requires_grad_(False).to("cuda", dtype=torch.float16)

geometry_motion_vae = UnifyAutoencoderKL.from_pretrained(
    vae_path,
    subfolder='geometry_motion_vae',
    low_cpu_mem_usage=True,
    torch_dtype=torch.float32,
    cache_dir=cache_dir
).requires_grad_(False).to("cuda", dtype=torch.float32)

if model_type == 'diff':
    pipe = MotionCrafterDiffPipeline.from_pretrained(
        "stabilityai/stable-video-diffusion-img2vid-xt",
        unet=unet,
        torch_dtype=torch.float16,
        variant="fp16",
        cache_dir=cache_dir
    ).to("cuda")
else:
    pipe = MotionCrafterDetermPipeline.from_pretrained(
        "stabilityai/stable-video-diffusion-img2vid-xt",
        unet=unet,
        torch_dtype=torch.float16,
        variant="fp16",
        cache_dir=cache_dir
    ).to("cuda")
```

## Model Weights

- geometry_motion_vae/: 4D VAE for joint geometry and motion representation
- unet_determ/: deterministic UNet for motion prediction

## Model Variants

- Deterministic (unet_determ): fast inference with fixed predictions per input
- Diffusion (unet_diff): probabilistic predictions with diverse outputs

## Citation

```bibtex
@article{zhu2025motioncrafter,
  title={MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE},
  author={Zhu, Ruijie and Lu, Jiahao and Hu, Wenbo and Han, Xiaoguang and Cai, Jianfei and Shan, Ying and Zheng, Chuanxia},
  journal={arXiv preprint arXiv:2602.08961},
  year={2026}
}
```

## License

This model is provided under the Tencent License. See [LICENSE.txt](LICENSE.txt) for details.

## Acknowledgments

This work builds upon [GeometryCrafter](https://github.com/TencentARC/GeometryCrafter). We thank the authors for their excellent contributions.