Improve model card: add library_name, abstract, detailed usage, and visuals

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +182 -5
README.md CHANGED
@@ -1,12 +1,189 @@
1
  ---
2
- pipeline_tag: image-to-video
3
  license: mit
 
 
4
  ---
5
 
6
- # MagicMotion
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
- MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance.
9
 
10
- [[Paper](https://arxiv.org/pdf/2503.16421)] [[Project Page](https://quanhaol.github.io/magicmotion-site/)] [[HuggingFace](https://huggingface.co/quanhaol/MagicMotion)]
11
 
12
- For code and sample usage, see https://github.com/quanhaol/MagicMotion.
 
1
  ---
 
2
  license: mit
3
+ pipeline_tag: image-to-video
4
+ library_name: diffusers
5
  ---
6
 
7
+ # MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
8
+
9
+ <a href="https://huggingface.co/papers/2503.16421"><img src="https://img.shields.io/static/v1?label=Paper&message=2503.16421&color=red&logo=arxiv"></a>
10
+ <a href="https://quanhaol.github.io/magicmotion-site/"><img src="https://img.shields.io/static/v1?label=Project&message=Page&color=green&logo=github-pages"></a>
11
+ <a href="https://huggingface.co/quanhaol/MagicMotion"><img src="https://img.shields.io/badge/%F0%9F%A4%97_HuggingFace-Model-ffbd45.svg" alt="HuggingFace Model"></a>
12
+ <a href="https://huggingface.co/datasets/quanhaol/MagicData"><img src="https://img.shields.io/badge/%F0%9F%A4%97_HuggingFace-Dataset-ffbd45.svg" alt="HuggingFace Dataset"></a>
13
+
14
+ <p align="center">
15
+ <img src="https://huggingface.co/quanhaol/MagicMotion/resolve/main/assets/teaser2.webp" width="100%" alt="MagicMotion Teaser Image">
16
+ </p>
17
+
18
+ MagicMotion is a novel image-to-video generation framework that enables trajectory control through three levels of conditions from dense to sparse: masks, bounding boxes, and sparse boxes. Given an input image and trajectories, MagicMotion seamlessly animates objects along defined trajectories while maintaining object consistency and visual quality.
19
+
20
+ ## Abstract
21
+
22
+ Recent advances in video generation have led to remarkable improvements in visual quality and temporal coherence. Upon this, trajectory-controllable video generation has emerged to enable precise object motion control through explicitly defined spatial paths. However, existing methods struggle with complex object movements and multi-object motion control, resulting in imprecise trajectory adherence, poor object consistency, and compromised visual quality. Furthermore, these methods only support trajectory control in a single format, limiting their applicability in diverse scenarios. Additionally, there is no publicly available dataset or benchmark specifically tailored for trajectory-controllable video generation, hindering robust training and systematic evaluation. To address these challenges, we introduce **MagicMotion**, a novel image-to-video generation framework that enables trajectory control through three levels of conditions from dense to sparse: masks, bounding boxes, and sparse boxes. Given an input image and trajectories, MagicMotion seamlessly animates objects along defined trajectories while maintaining object consistency and visual quality. Furthermore, we present **MagicData**, a large-scale trajectory-controlled video dataset, along with an automated pipeline for annotation and filtering. We also introduce **MagicBench**, a comprehensive benchmark that assesses both video quality and trajectory control accuracy across different numbers of objects. Extensive experiments demonstrate that MagicMotion outperforms previous methods across various metrics. Our project page are publicly available at this https URL .
23
+
24
+ <p align="center">
25
+ <img src="https://huggingface.co/quanhaol/MagicMotion/resolve/main/assets/teaser.webp" width="100%" alt="MagicMotion Demo Image">
26
+ </p>
27
+
28
+ ## News
29
+
30
+ - `2025/07/28` πŸ”₯πŸ”₯MagicData has been released [`here`](https://huggingface.co/datasets/quanhaol/MagicData). Welcome to use our dataset!
31
+ - `2025/06/26` πŸ”₯πŸ”₯MagicMotion has been accepted by ICCV2025!πŸŽ‰πŸŽ‰πŸŽ‰
32
+ - `2025/03/28` πŸ”₯πŸ”₯We released interactive demo with gradio for MagicMotion.
33
+ - `2025/03/27` MagicMotion can now perform inference on a single 4090 GPU (with less than 24GB of GPU memory).
34
+ - `2025/03/21` πŸ”₯πŸ”₯We released MagicMotion, including inference code and model weights.
35
+
36
+ ## Installation
37
+
38
+ To get started with MagicMotion, clone the repository and install the required dependencies:
39
+
40
+ ```bash
41
+ # Clone this repository.
42
+ git clone https://github.com/quanhaol/MagicMotion
43
+ cd MagicMotion
44
+
45
+ # Install requirements
46
+ conda env create -n magicmotion --file environment.yml
47
+ conda activate magicmotion
48
+ pip install git+https://github.com/huggingface/diffusers
49
+
50
+ # Install Grounded_SAM2 for trajectory construction
51
+ cd trajectory_construction/Grounded_SAM2
52
+ pip install -e .
53
+ pip install --no-build-isolation -e grounding_dino
54
+
55
+ # Optional: For image editing
56
+ pip install git+https://github.com/huggingface/image_gen_aux
57
+ ```
58
+
59
+ ## Model Weights
60
+
61
+ The model weights are organized into stages within the `ckpts` folder. You can download them using `huggingface-cli`:
62
+
63
+ ### Folder Structure
64
+
65
+ ```
66
+ MagicMotion
67
+ └── ckpts
68
+ β”œβ”€β”€ stage1
69
+ β”‚ β”œβ”€β”€ mask.pt
70
+ β”œβ”€β”€ stage2
71
+ β”‚ └── box.pt
72
+ β”‚ └── box_perception_head.pt
73
+ β”œβ”€β”€ stage3
74
+ β”‚ └── sparse_box.pt
75
+ β”‚ └── sparse_box_perception_head.pt
76
+ ```
77
+
78
+ ### Download Links
79
+
80
+ ```bash
81
+ pip install "huggingface_hub[hf_transfer]"
82
+ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download quanhaol/MagicMotion --local-dir ckpts
83
+ ```
84
+
85
+ ## Inference
86
+
87
+ Inference requires **only 23GB of GPU memory** (tested on a single 24GB NVIDIA GeForce RTX 4090 GPU).
88
+
89
+ If you have sufficient GPU memory, you can modify `magicmotion/inference.py` to improve runtime performance:
90
+
91
+ ```python
92
+ # Optimized setting (for GPUs with sufficient memory)
93
+ pipe.to("cuda")
94
+ # pipe.enable_sequential_cpu_offload()
95
+ ```
96
+ > **Note**: Using the optimized setting can reduce runtime by up to 2x.
97
+
98
+ ### Python Sample Usage (Conceptual)
99
+
100
+ MagicMotion integrates with the `diffusers` library. While the full pipeline involves custom trajectory construction, here's a conceptual example of how you might use `AutoPipelineForImage2Video` with downloaded checkpoints.
101
+
102
+ ```python
103
+ import torch
104
+ from diffusers import AutoPipelineForImage2Video
105
+ from PIL import Image
106
+ import os
107
+
108
+ # Ensure you have cloned the MagicMotion repository and downloaded the weights
109
+ # as per the "Installation" and "Model Weights" sections above.
110
+ # Example: If your MagicMotion folder is at './MagicMotion'
111
+ magicmotion_root = "./MagicMotion"
112
+ ckpt_path = os.path.join(magicmotion_root, "ckpts")
113
+
114
+ # Load the pipeline for a specific stage (e.g., stage 2 for box control)
115
+ # You might need to adjust `subfolder` based on the specific pipeline configuration
116
+ # in the MagicMotion project's inference logic.
117
+ # The `AutoPipelineForImage2Video` might require a specific structure if loading locally.
118
+ # Refer to the official GitHub repository for precise loading of the custom pipeline.
119
+ try:
120
+ pipe = AutoPipelineForImage2Video.from_pretrained(
121
+ magicmotion_root, # or a specific subfolder if a pipeline is defined there
122
+ torch_dtype=torch.float16,
123
+ local_files_only=True # Assumes checkpoints are downloaded locally
124
+ )
125
+ pipe.to("cuda") # Move to GPU if memory allows
126
+
127
+ # Placeholder for actual inputs
128
+ # You would load your input image (PIL Image) and generate/load trajectory conditions.
129
+ # For example:
130
+ # input_image = Image.open("your_input_image.png").convert("RGB")
131
+ # trajectory_conditions = {
132
+ # "bboxes": [[(x1, y1, x2, y2), ...], ...] # list of bboxes per frame for each object
133
+ # }
134
+
135
+ # Example inference call (conceptual, exact arguments depend on MagicMotion's pipeline)
136
+ # generated_video_frames = pipe(
137
+ # image=input_image,
138
+ # trajectory_conditions=trajectory_conditions,
139
+ # num_frames=25,
140
+ # guidance_scale=7.5,
141
+ # num_inference_steps=50,
142
+ # ).images
143
+
144
+ # print("Pipeline loaded. Please replace placeholder inputs with actual data.")
145
+
146
+ except Exception as e:
147
+ print(f"Failed to load pipeline directly. Please refer to the official GitHub repository's `magicmotion/scripts/inference/` for detailed usage instructions and specific model loading logic: {e}")
148
+
149
+ ```
150
+ For complete inference scripts and how to construct various trajectories (mask, bounding box, sparse box), please refer to the [official GitHub repository](https://github.com/quanhaol/MagicMotion) in the `magicmotion/scripts/inference` and `trajectory_construction` directories.
151
+
152
+ ## Gradio Demo
153
+
154
+ An interactive Gradio demo is available, which you can run locally:
155
+
156
+ ```bash
157
+ bash magicmotion/scripts/app/app.sh
158
+ ```
159
+
160
+ <img src="https://huggingface.co/quanhaol/MagicMotion/resolve/main/assets/images/gradio/1.png" alt="Gradio Demo Screenshot 1" style="width: 60%; border: 1px solid #ddd; border-radius: 4px; padding: 5px;"> <img src="https://huggingface.co/quanhaol/MagicMotion/resolve/main/assets/images/gradio/2.png" alt="Gradio Demo Screenshot 2" style="width: 60%; border: 1px solid #ddd; border-radius: 4px; padding: 5px;">
161
+
162
+ ## Acknowledgements
163
+
164
+ We would like to express our gratitude to the following open-source projects that have been instrumental in the development of our project:
165
+
166
+ - [CogVideo](https://github.com/THUDM/CogVideo): An open source video generation framework by THUKEG.
167
+ - [Open-Sora](https://github.com/hpcaitech/Open-Sora): An open source video generation framework by HPC-AI Tech.
168
+ - [finetrainers](https://github.com/a-r-r-o-w/finetrainers): A Memory-optimized training library for diffusion models.
169
+
170
+ Special thanks to the contributors of these libraries for their hard work and dedication!
171
+
172
+ ## Citation
173
+
174
+ If you find our work useful, **please consider giving a star to this GitHub repository and citing it**:
175
+
176
+ ```bibtex
177
+ @article{li2025magicmotion,
178
+ title={MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance},
179
+ author={Li, Quanhao and Xing, Zhen and Wang, Rui and Zhang, Hui and Dai, Qi and Wu, Zuxuan},
180
+ journal={arXiv preprint arXiv:2503.16421},
181
+ year={2025}
182
+ }
183
+ ```
184
 
185
+ ## Contact
186
 
187
+ If you have any suggestions or find our work helpful, feel free to contact us:
188
 
189
+ Email: liqh24@m.fudan.edu.cn or zhenxingfd@gmail.com