MotionCrafter / README.md

update arxiv id

817a1e6 about 13 hours ago

4.62 kB

	---
	language: [en]
	license: other
	library_name: motioncrafter
	tags:
	- motion
	- video
	- 4d
	- diffusion
	- scene-flow
	pipeline_tag: image-to-3d
	base_model: stabilityai/stable-video-diffusion-img2vid-xt
	---

	<h1 align="center" style="font-size: 1.6em;">MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE</h1>

	<div align="center">

	[Ruijie Zhu](https://ruijiezhu94.github.io/ruijiezhu/)<sup>1,2</sup>,
	[Jiahao Lu](https://scholar.google.com/citations?user=cRpteW4AAAAJ&hl=en)<sup>3</sup>,
	[Wenbo Hu](https://wbhu.github.io/)<sup>2</sup>,
	[Xiaoguang Han](https://scholar.google.com/citations?user=z-rqsR4AAAAJ&hl=en)<sup>4</sup><br>
	[Jianfei Cai](https://jianfei-cai.github.io/)<sup>5</sup>,
	[Ying Shan](https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en)<sup>2</sup>,
	[Chuanxia Zheng](https://physicalvision.github.io/people/~chuanxia)<sup>1</sup>

	<sup>1</sup> NTU   <sup>2</sup> ARC Lab, Tencent PCG   <sup>3</sup> HKUST   <sup>4</sup> CUHK(SZ)   <sup>5</sup> Monash University

	[📄 Paper](https://arxiv.org/abs/2602.08961) \| [🌐 Project Page](https://ruijiezhu94.github.io/MotionCrafter_Page/) \| [💻 Code](https://github.com/TencentARC/MotionCrafter) \| [📜 License](LICENSE.txt)

	</div>

	## Model Description

	MotionCrafter is a video diffusion-based framework that jointly reconstructs 4D geometry and estimates dense object motion from monocular videos. It predicts dense point maps and scene flow for each frame within a shared world coordinate system, without requiring post-optimization.

	## Intended Use

	- Research on 4D reconstruction and motion estimation from monocular videos
	- Academic evaluation and benchmarking of dense point map and scene flow prediction

	Not intended for safety-critical or real-time production use.

	## Limitations

	- Performance can degrade with extreme motion blur or severe occlusion.
	- Output quality is sensitive to input resolution and video quality.
	- Generalization may be limited for out-of-domain scenes.

	## Training Data

	Training data details and preprocessing are described in the paper and main repository. If you need dataset specifics, please refer to the project page and the paper.

	## Evaluation

	Please refer to the paper for evaluation datasets, metrics, and results.

	## How to Use

	```python
	import torch
	from motioncrafter import (
	MotionCrafterDiffPipeline,
	MotionCrafterDetermPipeline,
	UnifyAutoencoderKL,
	UNetSpatioTemporalConditionModelVid2vid
	)

	unet_path = "TencentARC/MotionCrafter"
	vae_path = "TencentARC/MotionCrafter"
	model_type = "determ" # or "diff" for diffusion version
	cache_dir = "./pretrained_models"

	unet = UNetSpatioTemporalConditionModelVid2vid.from_pretrained(
	unet_path,
	subfolder='unet_diff' if model_type == 'diff' else 'unet_determ',
	low_cpu_mem_usage=True,
	torch_dtype=torch.float16,
	cache_dir=cache_dir
	).requires_grad_(False).to("cuda", dtype=torch.float16)

	geometry_motion_vae = UnifyAutoencoderKL.from_pretrained(
	vae_path,
	subfolder='geometry_motion_vae',
	low_cpu_mem_usage=True,
	torch_dtype=torch.float32,
	cache_dir=cache_dir
	).requires_grad_(False).to("cuda", dtype=torch.float32)

	if model_type == 'diff':
	pipe = MotionCrafterDiffPipeline.from_pretrained(
	"stabilityai/stable-video-diffusion-img2vid-xt",
	unet=unet,
	torch_dtype=torch.float16,
	variant="fp16",
	cache_dir=cache_dir
	).to("cuda")
	else:
	pipe = MotionCrafterDetermPipeline.from_pretrained(
	"stabilityai/stable-video-diffusion-img2vid-xt",
	unet=unet,
	torch_dtype=torch.float16,
	variant="fp16",
	cache_dir=cache_dir
	).to("cuda")
	```

	## Model Weights

	- geometry_motion_vae/: 4D VAE for joint geometry and motion representation
	- unet_determ/: deterministic UNet for motion prediction

	## Model Variants

	- Deterministic (unet_determ): fast inference with fixed predictions per input
	- Diffusion (unet_diff): probabilistic predictions with diverse outputs

	## Citation

	```bibtex
	@article{zhu2025motioncrafter,
	title={MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE},
	author={Zhu, Ruijie and Lu, Jiahao and Hu, Wenbo and Han, Xiaoguang and Cai, Jianfei and Shan, Ying and Zheng, Chuanxia},
	journal={arXiv preprint arXiv:2602.08961},
	year={2026}
	}
	```

	## License

	This model is provided under the Tencent License. See [LICENSE.txt](LICENSE.txt) for details.

	## Acknowledgments

	This work builds upon [GeometryCrafter](https://github.com/TencentARC/GeometryCrafter). We thank the authors for their excellent contributions.