RuijieZhu commited on
Commit
b97dcb7
·
1 Parent(s): 39b9966

update readme

Browse files
Files changed (1) hide show
  1. README.md +139 -0
README.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
3
+
4
+ <div align="center">
5
+
6
+ [Ruijie Zhu](https://ruijiezhu94.github.io/ruijiezhu/)<sup>1,2</sup>,
7
+ [Jiahao Lu](https://scholar.google.com/citations?user=cRpteW4AAAAJ&hl=en)<sup>3</sup>,
8
+ [Wenbo Hu](https://wbhu.github.io/)<sup>2</sup>,
9
+ [Xiaoguang Han](https://scholar.google.com/citations?user=z-rqsR4AAAAJ&hl=en)<sup>4</sup>,
10
+ [Jianfei Cai](https://jianfei-cai.github.io/)<sup>5</sup>,
11
+ [Ying Shan](https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en)<sup>2</sup>,
12
+ [Chuanxia Zheng](https://physicalvision.github.io/people/~chuanxia)<sup>1</sup>
13
+
14
+ <sup>1</sup> NTU &nbsp; <sup>2</sup> ARC Lab, Tencent PCG &nbsp; <sup>3</sup> HKUST &nbsp; <sup>4</sup> CUHK(SZ) &nbsp; <sup>5</sup> Monash University
15
+
16
+ [📄 Paper](https://arxiv.org/abs/xxxxx) | [🌐 Project Page](https://ruijiezhu94.github.io/MotionCrafter_Page/) | [💻 Code](https://github.com/TencentARC/MotionCrafter) | [📜 License](LICENSE.txt)
17
+
18
+ </div>
19
+
20
+ ---
21
+
22
+ ## Overview
23
+
24
+ This repository contains the pretrained model weights for **MotionCrafter**, a video diffusion-based framework that jointly reconstructs 4D geometry and estimates dense object motion from monocular videos.
25
+
26
+
27
+ MotionCrafter simultaneously predicts:
28
+ - **Dense point maps**: 3D coordinates in world space for each pixel
29
+ - **Scene flow**: Per-pixel motion estimation across frames
30
+
31
+ All predictions are made within a unified world coordinate system, without requiring post-optimization.
32
+
33
+ ## Model Weights
34
+
35
+ This repository includes the following pretrained models:
36
+
37
+ ### 1. Geometry Motion VAE (`geometry_motion_vae/`)
38
+ - **Purpose**: Encodes 4D geometry and motion information into a latent space
39
+ - **Architecture**: 4D VAE for joint geometry and motion representation
40
+ - **Input**: Videos with associated geometry and motion annotations
41
+ - **Output**: Compressed 4D latent codes
42
+
43
+ ### 2. UNet Deterministic (`unet_determ/`)
44
+ - **Purpose**: Predicts dense geometry and motion from video frames
45
+ - **Architecture**: Deterministic UNet conditioned on video input
46
+ - **Input**: Video frames
47
+ - **Output**: Dense point maps and scene flow predictions
48
+
49
+ ## Usage
50
+
51
+ ### Basic Usage
52
+
53
+ Load the pretrained models using the MotionCrafter library:
54
+
55
+ ```python
56
+ import torch
57
+ from motioncrafter import (
58
+ MotionCrafterDiffPipeline,
59
+ MotionCrafterDetermPipeline,
60
+ UnifyAutoencoderKL,
61
+ UNetSpatioTemporalConditionModelVid2vid
62
+ )
63
+
64
+ # Paths to model weights (or use HuggingFace repo ID)
65
+ unet_path = "TencentARC/MotionCrafter"
66
+ vae_path = "TencentARC/MotionCrafter"
67
+ model_type = "determ" # or "diff" for diffusion version
68
+ cache_dir = "./pretrained_models"
69
+
70
+ # Load UNet model for motion generation
71
+ unet = UNetSpatioTemporalConditionModelVid2vid.from_pretrained(
72
+ unet_path,
73
+ subfolder='unet_diff' if model_type == 'diff' else 'unet_determ',
74
+ low_cpu_mem_usage=True,
75
+ torch_dtype=torch.float16,
76
+ cache_dir=cache_dir
77
+ ).requires_grad_(False).to("cuda", dtype=torch.float16)
78
+
79
+ # Load geometry and motion VAE for point map decoding
80
+ geometry_motion_vae = UnifyAutoencoderKL.from_pretrained(
81
+ vae_path,
82
+ subfolder='geometry_motion_vae',
83
+ low_cpu_mem_usage=True,
84
+ torch_dtype=torch.float32,
85
+ cache_dir=cache_dir
86
+ ).requires_grad_(False).to("cuda", dtype=torch.float32)
87
+
88
+ # Initialize pipeline based on model type
89
+ if model_type == 'diff':
90
+ pipe = MotionCrafterDiffPipeline.from_pretrained(
91
+ "stabilityai/stable-video-diffusion-img2vid-xt",
92
+ unet=unet,
93
+ torch_dtype=torch.float16,
94
+ variant="fp16",
95
+ cache_dir=cache_dir
96
+ ).to("cuda")
97
+ else:
98
+ pipe = MotionCrafterDetermPipeline.from_pretrained(
99
+ "stabilityai/stable-video-diffusion-img2vid-xt",
100
+ unet=unet,
101
+ torch_dtype=torch.float16,
102
+ variant="fp16",
103
+ cache_dir=cache_dir
104
+ ).to("cuda")
105
+
106
+ # Your inference code here...
107
+ ```
108
+
109
+ ### Model Variants
110
+
111
+ - **Deterministic (`unet_determ`)**: Fast inference with fixed predictions per input
112
+ - **Diffusion (`unet_diff`)**: Probabilistic predictions with diverse outputs
113
+
114
+ For complete inference examples and additional documentation, please refer to the [main repository](https://github.com/TencentARC/MotionCrafter).
115
+
116
+ ## Model Details
117
+
118
+ - **Framework**: PyTorch
119
+ - **Model Format**: `safetensors` (for safe model loading)
120
+ - **Resolution**: Supports variable resolutions (e.g., 320×640, 512×1024)
121
+ - **Frame Count**: Tested with 25 frames
122
+
123
+ ## Citation
124
+
125
+ If you find MotionCrafter useful for your research, please cite:
126
+
127
+ ```bibtex
128
+
129
+ ```
130
+
131
+ ## License
132
+
133
+ This model is provided under the Tencent License. Please see [LICENSE.txt](LICENSE.txt) for details.
134
+
135
+ ## Acknowledgments
136
+
137
+ This work builds upon [GeometryCrafter](https://github.com/TencentARC/GeometryCrafter). We thank the authors for their excellent contributions.
138
+
139
+