RuijieZhu commited on
Commit
6eff5c3
·
1 Parent(s): b97dcb7

update model card info

Browse files
Files changed (1) hide show
  1. README.md +44 -43
README.md CHANGED
@@ -1,3 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
 
2
  # MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
3
 
@@ -17,40 +30,32 @@
17
 
18
  </div>
19
 
20
- ---
21
 
22
- ## Overview
23
 
24
- This repository contains the pretrained model weights for **MotionCrafter**, a video diffusion-based framework that jointly reconstructs 4D geometry and estimates dense object motion from monocular videos.
25
 
 
 
26
 
27
- MotionCrafter simultaneously predicts:
28
- - **Dense point maps**: 3D coordinates in world space for each pixel
29
- - **Scene flow**: Per-pixel motion estimation across frames
30
 
31
- All predictions are made within a unified world coordinate system, without requiring post-optimization.
32
 
33
- ## Model Weights
 
 
34
 
35
- This repository includes the following pretrained models:
36
 
37
- ### 1. Geometry Motion VAE (`geometry_motion_vae/`)
38
- - **Purpose**: Encodes 4D geometry and motion information into a latent space
39
- - **Architecture**: 4D VAE for joint geometry and motion representation
40
- - **Input**: Videos with associated geometry and motion annotations
41
- - **Output**: Compressed 4D latent codes
42
 
43
- ### 2. UNet Deterministic (`unet_determ/`)
44
- - **Purpose**: Predicts dense geometry and motion from video frames
45
- - **Architecture**: Deterministic UNet conditioned on video input
46
- - **Input**: Video frames
47
- - **Output**: Dense point maps and scene flow predictions
48
 
49
- ## Usage
50
 
51
- ### Basic Usage
52
-
53
- Load the pretrained models using the MotionCrafter library:
54
 
55
  ```python
56
  import torch
@@ -61,13 +66,11 @@ from motioncrafter import (
61
  UNetSpatioTemporalConditionModelVid2vid
62
  )
63
 
64
- # Paths to model weights (or use HuggingFace repo ID)
65
  unet_path = "TencentARC/MotionCrafter"
66
  vae_path = "TencentARC/MotionCrafter"
67
  model_type = "determ" # or "diff" for diffusion version
68
  cache_dir = "./pretrained_models"
69
 
70
- # Load UNet model for motion generation
71
  unet = UNetSpatioTemporalConditionModelVid2vid.from_pretrained(
72
  unet_path,
73
  subfolder='unet_diff' if model_type == 'diff' else 'unet_determ',
@@ -76,7 +79,6 @@ unet = UNetSpatioTemporalConditionModelVid2vid.from_pretrained(
76
  cache_dir=cache_dir
77
  ).requires_grad_(False).to("cuda", dtype=torch.float16)
78
 
79
- # Load geometry and motion VAE for point map decoding
80
  geometry_motion_vae = UnifyAutoencoderKL.from_pretrained(
81
  vae_path,
82
  subfolder='geometry_motion_vae',
@@ -85,7 +87,6 @@ geometry_motion_vae = UnifyAutoencoderKL.from_pretrained(
85
  cache_dir=cache_dir
86
  ).requires_grad_(False).to("cuda", dtype=torch.float32)
87
 
88
- # Initialize pipeline based on model type
89
  if model_type == 'diff':
90
  pipe = MotionCrafterDiffPipeline.from_pretrained(
91
  "stabilityai/stable-video-diffusion-img2vid-xt",
@@ -102,38 +103,38 @@ else:
102
  variant="fp16",
103
  cache_dir=cache_dir
104
  ).to("cuda")
105
-
106
- # Your inference code here...
107
  ```
108
 
109
- ### Model Variants
110
-
111
- - **Deterministic (`unet_determ`)**: Fast inference with fixed predictions per input
112
- - **Diffusion (`unet_diff`)**: Probabilistic predictions with diverse outputs
113
 
114
- For complete inference examples and additional documentation, please refer to the [main repository](https://github.com/TencentARC/MotionCrafter).
 
115
 
116
- ## Model Details
117
 
118
- - **Framework**: PyTorch
119
- - **Model Format**: `safetensors` (for safe model loading)
120
- - **Resolution**: Supports variable resolutions (e.g., 320×640, 512×1024)
121
- - **Frame Count**: Tested with 25 frames
122
 
123
  ## Citation
124
 
125
- If you find MotionCrafter useful for your research, please cite:
126
-
127
  ```bibtex
128
-
 
 
 
 
 
129
  ```
130
 
131
  ## License
132
 
133
- This model is provided under the Tencent License. Please see [LICENSE.txt](LICENSE.txt) for details.
134
 
135
  ## Acknowledgments
136
 
137
  This work builds upon [GeometryCrafter](https://github.com/TencentARC/GeometryCrafter). We thank the authors for their excellent contributions.
138
 
139
 
 
 
 
 
1
+ ---
2
+ language: [en]
3
+ license: other
4
+ library_name: motioncrafter
5
+ tags:
6
+ - motion
7
+ - video
8
+ - 4d
9
+ - diffusion
10
+ - scene-flow
11
+ pipeline_tag: image-to-3d
12
+ base_model: stabilityai/stable-video-diffusion-img2vid-xt
13
+ ---
14
 
15
  # MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
16
 
 
30
 
31
  </div>
32
 
33
+ ## Model Description
34
 
35
+ MotionCrafter is a video diffusion-based framework that jointly reconstructs 4D geometry and estimates dense object motion from monocular videos. It predicts dense point maps and scene flow for each frame within a shared world coordinate system, without requiring post-optimization.
36
 
37
+ ## Intended Use
38
 
39
+ - Research on 4D reconstruction and motion estimation from monocular videos
40
+ - Academic evaluation and benchmarking of dense point map and scene flow prediction
41
 
42
+ Not intended for safety-critical or real-time production use.
 
 
43
 
44
+ ## Limitations
45
 
46
+ - Performance can degrade with extreme motion blur or severe occlusion.
47
+ - Output quality is sensitive to input resolution and video quality.
48
+ - Generalization may be limited for out-of-domain scenes.
49
 
50
+ ## Training Data
51
 
52
+ Training data details and preprocessing are described in the paper and main repository. If you need dataset specifics, please refer to the project page and the paper.
 
 
 
 
53
 
54
+ ## Evaluation
 
 
 
 
55
 
56
+ Please refer to the paper for evaluation datasets, metrics, and results.
57
 
58
+ ## How to Use
 
 
59
 
60
  ```python
61
  import torch
 
66
  UNetSpatioTemporalConditionModelVid2vid
67
  )
68
 
 
69
  unet_path = "TencentARC/MotionCrafter"
70
  vae_path = "TencentARC/MotionCrafter"
71
  model_type = "determ" # or "diff" for diffusion version
72
  cache_dir = "./pretrained_models"
73
 
 
74
  unet = UNetSpatioTemporalConditionModelVid2vid.from_pretrained(
75
  unet_path,
76
  subfolder='unet_diff' if model_type == 'diff' else 'unet_determ',
 
79
  cache_dir=cache_dir
80
  ).requires_grad_(False).to("cuda", dtype=torch.float16)
81
 
 
82
  geometry_motion_vae = UnifyAutoencoderKL.from_pretrained(
83
  vae_path,
84
  subfolder='geometry_motion_vae',
 
87
  cache_dir=cache_dir
88
  ).requires_grad_(False).to("cuda", dtype=torch.float32)
89
 
 
90
  if model_type == 'diff':
91
  pipe = MotionCrafterDiffPipeline.from_pretrained(
92
  "stabilityai/stable-video-diffusion-img2vid-xt",
 
103
  variant="fp16",
104
  cache_dir=cache_dir
105
  ).to("cuda")
 
 
106
  ```
107
 
108
+ ## Model Weights
 
 
 
109
 
110
+ - geometry_motion_vae/: 4D VAE for joint geometry and motion representation
111
+ - unet_determ/: deterministic UNet for motion prediction
112
 
113
+ ## Model Variants
114
 
115
+ - Deterministic (unet_determ): fast inference with fixed predictions per input
116
+ - Diffusion (unet_diff): probabilistic predictions with diverse outputs
 
 
117
 
118
  ## Citation
119
 
 
 
120
  ```bibtex
121
+ @inproceedings{zhu2025motioncrafter,
122
+ title={MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE},
123
+ author={Zhu, Ruijie and Lu, Jiahao and Hu, Wenbo and Han, Xiaoguang and Cai, Jianfei and Shan, Ying and Zheng, Chuanxia},
124
+ booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
125
+ year={2025}
126
+ }
127
  ```
128
 
129
  ## License
130
 
131
+ This model is provided under the Tencent License. See [LICENSE.txt](LICENSE.txt) for details.
132
 
133
  ## Acknowledgments
134
 
135
  This work builds upon [GeometryCrafter](https://github.com/TencentARC/GeometryCrafter). We thank the authors for their excellent contributions.
136
 
137
 
138
+ This work builds upon [GeometryCrafter](https://github.com/TencentARC/GeometryCrafter). We thank the authors for their excellent contributions.
139
+
140
+