File size: 13,689 Bytes
91d80b1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f23910f
 
 
 
aeae87e
91d80b1
f23910f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f065916
f23910f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f065916
 
 
 
 
 
 
 
 
f23910f
91d80b1
 
 
f23910f
91d80b1
 
f23910f
91d80b1
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
---
license: other
library_name: diffusers
tags:
- motion-transfer
- comfyui
- video-generation
- image-to-video
- comfyui
- video-edit
pipeline_tag: video-to-video
base_model:
- alibaba-pai/Wan2.2-Fun-5B-Control
---

<div align="center">

# FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control

<a href="https://arxiv.org/abs/2602.13185"><img src="https://img.shields.io/badge/arXiv-2602.13185-b31b1b.svg" alt="arXiv"></a>
<a href="https://github.com/IGL-HKUST/FlexAM"><img src="https://img.shields.io/badge/GitHub-Repository-181717.svg?logo=github&logoColor=white" alt="GitHub"></a>
<a href="assets/flexam_workflow.json"><img src="https://img.shields.io/badge/ComfyUI-Download_Workflow-4fd63d" alt="ComfyUI"></a>

<br>
<br>

Mingzhi Sheng<sup>1*</sup>, Zekai Gu<sup>2*</sup>, Peng Li<sup>2</sup>, Cheng Lin<sup>3</sup>, Hao-Xiang Guo<sup>4</sup>, Ying-Cong Chen<sup>1,2†</sup>, Yuan Liu<sup>2†</sup>

<br>

<sup>1</sup>HKUST(GZ), <sup>2</sup>HKUST, <sup>3</sup>MUST, <sup>4</sup>Tsinghua University
<br>
<small><sup>*</sup>Equal Contribution, <sup>†</sup>Corresponding Authors</small>

</div>

<br>

![teaser](assets/teaser.gif)

## πŸ“° News
- **[2026.02.14]** πŸ“„ The paper is available on arXiv.
- **[2026.02.13]** πŸš€ We have released the inference code and **ComfyUI** support!


## πŸ› οΈ Installation  
> πŸ“’ **System Requirements**: Both the official Python inference code and the ComfyUI workflow were tested on **Ubuntu 20.04** with **Python 3.10**, **PyTorch 2.5.1**, and **CUDA 12.1** on an **NVIDIA A800** GPU.

Before running any inference (Python or ComfyUI), please setup the environment and download the checkpoints.  

### 1. Create environment
 Clone the repository and create conda environment: 

```
git clone https://github.com/IGL-HKUST/FlexAM
conda create -n flexam python=3.10
conda activate flexam
```

Install pytorch, we recommend `Pytorch 2.5.1` with `CUDA 12.1`:

 ```
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu121
```
```
pip install -r requirements.txt
```
### 2. Download Submodules
 We rely on several external modules (MoGe, Pi3, etc.).
 
```
mkdir -p submodules
git submodule update --init --recursive
pip install -r requirements.txt
```
<details> 
<summary><em>(Optional) Manual clone if submodule update fails</em></summary>
```
# DELTA
git clone https://github.com/snap-research/DELTA_densetrack3d.git submodules/MoGe
# Pi3
git clone https://github.com/yyfz/Pi3.git submodules/Pi3
# MoGe
git clone https://github.com/microsoft/MoGe.git submodules/MoGe
# VGGT
git clone https://github.com/facebookresearch/vggt.git submodules/vggt
```
</details>

### 3. Download checkpoints
Download the FlexAM checkpoint and place it in the`checkpoints/` directory.

- HuggingFace Link: [Wan2.2-Fun-5B-FLEXAM](https://huggingface.co/SandwichZ/Wan2.2-Fun-5B-FLEXAM)



## πŸš€ Inference
We provide two ways to use FlexAM: Python Script and ComfyUI.

### Option A: ComfyUI Integration
We provide a native node for seamless integration into ComfyUI workflows.
> ⚠️ **Note**: Currently, the ComfyUI node supports **Motion Transfer**, **Foreground Edit**, and **Background Edit**. For *Camera Control* and *Object Manipulation*, please use the Python script.
#### 1. Install Node
Since we are not yet in the Manager, please install manually:
```
cd ComfyUI/custom_nodes/
git clone https://github.com/IGL-HKUST/FlexAM
cd FlexAM
pip install -r requirements.txt
```
#### 2. Run Workflow
- Step 1: Download the workflow JSON: [workflow.json](assets/flexam_workflow.json)
- Step 2: Drag and drop it into ComfyUI.
- Step 3: Ensure checkpoints are in `ComfyUI/models/checkpoints`.

### Option B: Python Script

We provide a inference script for our tasks. Please refer to `run_demo.sh` to run the `demo.py` script.

Or you can run these tasks one by one as follows.

#### 1. Motion Transfer 
![motion_transfer](assets/motion_transfer.gif)
---

```python
python demo.py \
    --prompt <"prompt text"> \ # prompt text
    --checkpoint_path <model_path> \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
    --output_dir <output_dir> \ # output directory
    --input_path <input_path> \ # the reference video path
    --repaint <True/repaint_path > \ # the repaint first frame image path of input source video or use FLUX to repaint the first frame \
    --video_length=97 \ 
    --sample_size 512 896 \
    --generate_type='full_edit' \
    --density 10 \ # Control the sparsity of tracking points
    --gpu <gpu_id> \ # the gpu id
```


#### 2. foreground edit
![fg_edit](assets/fg_edit.gif)
```python
python demo.py \
    --prompt <"prompt text"> \ # prompt text
    --checkpoint_path <model_path> \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
    --output_dir <output_dir> \ # output directory
    --input_path <input_path> \ # the reference video path
    --repaint <True/repaint_path > \ # the repaint first frame image path of input source video or use FLUX to repaint the first frame \
    --mask_path <mask_path> \ # White (255) represents the foreground to be edited, and black (0) remains unchanged
    --video_length=97 \ 
    --sample_size 512 896 \
    --generate_type='foreground_edit' \ 
    --dilation_pixels=30 \ # Dilation pixels for mask processing in foreground_edit mode
    --density 10 \ # Control the sparsity of tracking points
    --gpu <gpu_id> \ # the gpu id
```

#### 3. background edit
![bg_edit](assets/bg_edit.gif)
```python
python demo.py \
    --prompt <"prompt text"> \ # prompt text
    --checkpoint_path <model_path> \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
    --output_dir <output_dir> \ # output directory
    --input_path <input_path> \ # the reference video path
    --repaint <True/repaint_path > \ # the repaint first frame image path of input source video or use FLUX to repaint the first frame \
    --mask_path <mask_path> \ # White (255) represents the unchanged foreground, while the background indicates the area to be edited
    --video_length=97 \ 
    --sample_size 512 896 \
    --generate_type='background_edit' \
    --density 10 \ # Control the sparsity of tracking points
    --gpu <gpu_id> \ # the gpu id
```

#### 4. Camera Control
![camera_ctrl](assets/camera_ctrl.gif)

We provide three camera control methods: 1. Use predefined templates; 2. Use a pose text file (pose txt); 3. Input another video, where the "Pi3" automatically estimates the camera pose from it and applies it to the video to be generated.

##### 1. Use predefined templates

We provide several template camera motion types, you can choose one of them. In practice, we find that providing a description of the camera motion in prompt will get better results.
```python
python demo.py \
    --prompt <"prompt text"> \ # prompt text
    --checkpoint_path <model_path> \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
    --output_dir <output_dir> \ # output directory
    --input_path <input_path> \ # the reference image or video path
    --camera_motion <camera_motion> \ # the camera motion type, see examples below
    --tracking_method <tracking_method> \ # the tracking method (moge, DELTA). For image input, 'moge' is necessary.
    --override_extrinsics <override/append> \ # how to apply camera motion: "override" to replace original camera, "append" to build upon it
    --video_length=97 \ 
    --sample_size 512 896 \    
    --density 5 \ # Control the sparsity of tracking points
    --gpu <gpu_id> \ # the gpu id
```

Here are some tips for camera motion:
- trans: translation motion, the camera will move in the direction of the vector (dx, dy, dz) with range [-1, 1]
  - Positive X: Move left, Negative X: Move right
  - Positive Y: Move down, Negative Y: Move up
  - Positive Z: Zoom in, Negative Z: Zoom out
  - e.g., 'trans -0.1 -0.1 -0.1' moving right, down and zoom in
  - e.g., 'trans -0.1 0.0 0.0 5 45' moving right 0.1 from frame 5 to 45
- rot: rotation motion, the camera will rotate around the axis (x, y, z) by the angle
  - X-axis rotation: positive X: pitch down, negative X: pitch up
  - Y-axis rotation: positive Y: yaw left, negative Y: yaw right
  - Z-axis rotation: positive Z: roll counter-clockwise, negative Z: roll clockwise
  - e.g., 'rot y 25' rotating 25 degrees around y-axis (yaw left)
  - e.g., 'rot x -30 10 40' rotating -30 degrees around x-axis (pitch up) from frame 10 to 40
- spiral: spiral motion, the camera will move in a spiral path with the given radius
  - e.g., 'spiral 2' spiral motion with radius 2
  - e.g., 'spiral 2 15 35' spiral motion with radius 2 from frame 15 to 35

Multiple transformations can be combined using semicolon (;) as separator:
- e.g., "trans 0 0 -0.5 0 30; rot x -25 0 30; trans -0.1 0 0 30 48"
  This will:
  1. Zoom in (z-0.5) from frame 0 to 30
  2. Pitch up (rotate -25 degrees around x-axis) from frame 0 to 30
  3. Move right (x-0.1) from frame 30 to 48

Notes:
- If start_frame and end_frame are not specified, the motion will be applied to all frames (0-48)
- Frames after end_frame will maintain the final transformation
- For combined transformations, they are applied in sequence


##### 2. Use a pose text file (pose txt)

```python
python demo.py \
    --prompt <"prompt text"> \ # prompt text
    --checkpoint_path <model_path> \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
    --output_dir <output_dir> \ # output directory
    --input_path <input_path> \ # the reference image or video path
    --camera_motion "path" \ # if camera motion type is "path", --pose_file is needed
    --pose_file <pose_file_txt> \ # txt file of camera pose, Each line corresponds to one frame
    --tracking_method <tracking_method> \ # the tracking method (moge, DELTA). For image input, 'moge' is necessary.
    --override_extrinsics <override/append> \ # how to apply camera motion: "override" to replace original camera, "append" to build upon it
    --video_length=97 \ 
    --sample_size 512 896 \    
    --density 5 \ # Control the sparsity of tracking points
    --gpu <gpu_id> \ # the gpu id
```

##### 3. Input another video for extract camera pose

```python
python demo.py \
    --prompt <"prompt text"> \ # prompt text
    --checkpoint_path <model_path> \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
    --output_dir <output_dir> \ # output directory
    --input_path <input_path> \ # the reference image or video path
    --camera_motion "path" \ # if camera motion type is "path", --pose_file is needed
    --pose_file <pose_file_mp4> \ # "Pi3" automatically estimates the camera pose from this video file 
    --tracking_method <tracking_method> \ # the tracking method (moge, DELTA). For image input, 'moge' is necessary.
    --override_extrinsics <override/append> \ # how to apply camera motion: "override" to replace original camera, "append" to build upon it
    --video_length=97 \ 
    --sample_size 512 896 \    
    --density 5 \ # Control the sparsity of tracking points
    --gpu <gpu_id> \ # the gpu id
```


#### 5. Object Manipulation
![object](assets/object.gif)
We provide several template object manipulation types, you can choose one of them. In practice, we find that providing a description of the object motion in prompt will get better results.
```python
python demo.py \
    --prompt <"prompt text"> \ # prompt text
    --checkpoint_path <model_path> \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
    --input_path <input_path> \ # the reference image path
    --object_motion <object_motion> \ # the object motion type (up, down, left, right)
    --object_mask <object_mask_path> \ # the object mask path
    --tracking_method <tracking_method> \ # the tracking method (moge, DELTA). For image input, 'moge' is nesserary.
    --sample_size 512 896 \
    --video_length=49 \
    --density 30 \    
    --gpu <gpu_id> \ # the gpu id
```
It should be noted that depending on the tracker you choose, you may need to modify the scale of translation.


## πŸ™ Acknowledgements

This project builds upon several excellent open source projects:

* [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun)

* [DELTA](https://github.com/snap-research/DELTA_densetrack3d)

* [MoGe](https://github.com/microsoft/MoGe)

* [vggt](https://github.com/facebookresearch/vggt)

* [Pi3](https://github.com/yyfz/Pi3)

We thank the authors and contributors of these projects for their valuable contributions to the open source community!

## 🌟 Citation
If you find FlexAM useful for your research, please cite our paper:
```
@misc{sheng2026FlexAM,
  title={FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control},
  author={Sheng, Mingzhi and Gu, Zekai and Li, Peng and Lin, Cheng and Guo, Hao-Xiang and Chen, Ying-Cong and Liu, Yuan},
  year={2026},
  eprint={2602.13185},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2602.13185},
}
```
## βš–οΈ License

This model checkpoint is based on **FlexAM**.

- **Model Architecture / Code**: Licensed under **Apache 2.0** (or CC-BY-SA 4.0, consistent with your GitHub).
- **Embedded DELTA Weights**: This checkpoint contains weights from **DELTA (Snap Inc.)**, which are restricted to **Non-Commercial, Research-Only** use.

**⚠️ Usage Note:**
By downloading or using these weights, you agree to comply with the **Snap Inc. License** regarding the DELTA modules. Please refer to the [LICENSE](./LICENSE) file in this repository for the full text.