---
license: other
library_name: diffusers
tags:
- motion-transfer
- comfyui
- video-generation
- image-to-video
- comfyui
- video-edit
pipeline_tag: video-to-video
base_model:
- alibaba-pai/Wan2.2-Fun-5B-Control
---
# FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control
Mingzhi Sheng
1*, Zekai Gu
2*, Peng Li
2, Cheng Lin
3, Hao-Xiang Guo
4, Ying-Cong Chen
1,2†, Yuan Liu
2†
1HKUST(GZ),
2HKUST,
3MUST,
4Tsinghua University
*Equal Contribution, †Corresponding Authors

## 📰 News
- **[2026.02.14]** 📄 The paper is available on arXiv.
- **[2026.02.13]** 🚀 We have released the inference code and **ComfyUI** support!
## 🛠️ Installation
> 📢 **System Requirements**: Both the official Python inference code and the ComfyUI workflow were tested on **Ubuntu 20.04** with **Python 3.10**, **PyTorch 2.5.1**, and **CUDA 12.1** on an **NVIDIA A800** GPU.
Before running any inference (Python or ComfyUI), please setup the environment and download the checkpoints.
### 1. Create environment
Clone the repository and create conda environment:
```
git clone https://github.com/IGL-HKUST/FlexAM
conda create -n flexam python=3.10
conda activate flexam
```
Install pytorch, we recommend `Pytorch 2.5.1` with `CUDA 12.1`:
```
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu121
```
```
pip install -r requirements.txt
```
### 2. Download Submodules
We rely on several external modules (MoGe, Pi3, etc.).
```
mkdir -p submodules
git submodule update --init --recursive
pip install -r requirements.txt
```
(Optional) Manual clone if submodule update fails
```
# DELTA
git clone https://github.com/snap-research/DELTA_densetrack3d.git submodules/MoGe
# Pi3
git clone https://github.com/yyfz/Pi3.git submodules/Pi3
# MoGe
git clone https://github.com/microsoft/MoGe.git submodules/MoGe
# VGGT
git clone https://github.com/facebookresearch/vggt.git submodules/vggt
```
### 3. Download checkpoints
Download the FlexAM checkpoint and place it in the`checkpoints/` directory.
- HuggingFace Link: [Wan2.2-Fun-5B-FLEXAM](https://huggingface.co/SandwichZ/Wan2.2-Fun-5B-FLEXAM)
## 🚀 Inference
We provide two ways to use FlexAM: Python Script and ComfyUI.
### Option A: ComfyUI Integration
We provide a native node for seamless integration into ComfyUI workflows.
> ⚠️ **Note**: Currently, the ComfyUI node supports **Motion Transfer**, **Foreground Edit**, and **Background Edit**. For *Camera Control* and *Object Manipulation*, please use the Python script.
#### 1. Install Node
Since we are not yet in the Manager, please install manually:
```
cd ComfyUI/custom_nodes/
git clone https://github.com/IGL-HKUST/FlexAM
cd FlexAM
pip install -r requirements.txt
```
#### 2. Run Workflow
- Step 1: Download the workflow JSON: [workflow.json](assets/flexam_workflow.json)
- Step 2: Drag and drop it into ComfyUI.
- Step 3: Ensure checkpoints are in `ComfyUI/models/checkpoints`.
### Option B: Python Script
We provide a inference script for our tasks. Please refer to `run_demo.sh` to run the `demo.py` script.
Or you can run these tasks one by one as follows.
#### 1. Motion Transfer

---
```python
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
--output_dir \ # output directory
--input_path \ # the reference video path
--repaint \ # the repaint first frame image path of input source video or use FLUX to repaint the first frame \
--video_length=97 \
--sample_size 512 896 \
--generate_type='full_edit' \
--density 10 \ # Control the sparsity of tracking points
--gpu \ # the gpu id
```
#### 2. foreground edit

```python
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
--output_dir \ # output directory
--input_path \ # the reference video path
--repaint \ # the repaint first frame image path of input source video or use FLUX to repaint the first frame \
--mask_path \ # White (255) represents the foreground to be edited, and black (0) remains unchanged
--video_length=97 \
--sample_size 512 896 \
--generate_type='foreground_edit' \
--dilation_pixels=30 \ # Dilation pixels for mask processing in foreground_edit mode
--density 10 \ # Control the sparsity of tracking points
--gpu \ # the gpu id
```
#### 3. background edit

```python
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
--output_dir \ # output directory
--input_path \ # the reference video path
--repaint \ # the repaint first frame image path of input source video or use FLUX to repaint the first frame \
--mask_path \ # White (255) represents the unchanged foreground, while the background indicates the area to be edited
--video_length=97 \
--sample_size 512 896 \
--generate_type='background_edit' \
--density 10 \ # Control the sparsity of tracking points
--gpu \ # the gpu id
```
#### 4. Camera Control

We provide three camera control methods: 1. Use predefined templates; 2. Use a pose text file (pose txt); 3. Input another video, where the "Pi3" automatically estimates the camera pose from it and applies it to the video to be generated.
##### 1. Use predefined templates
We provide several template camera motion types, you can choose one of them. In practice, we find that providing a description of the camera motion in prompt will get better results.
```python
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
--output_dir \ # output directory
--input_path \ # the reference image or video path
--camera_motion \ # the camera motion type, see examples below
--tracking_method \ # the tracking method (moge, DELTA). For image input, 'moge' is necessary.
--override_extrinsics \ # how to apply camera motion: "override" to replace original camera, "append" to build upon it
--video_length=97 \
--sample_size 512 896 \
--density 5 \ # Control the sparsity of tracking points
--gpu \ # the gpu id
```
Here are some tips for camera motion:
- trans: translation motion, the camera will move in the direction of the vector (dx, dy, dz) with range [-1, 1]
- Positive X: Move left, Negative X: Move right
- Positive Y: Move down, Negative Y: Move up
- Positive Z: Zoom in, Negative Z: Zoom out
- e.g., 'trans -0.1 -0.1 -0.1' moving right, down and zoom in
- e.g., 'trans -0.1 0.0 0.0 5 45' moving right 0.1 from frame 5 to 45
- rot: rotation motion, the camera will rotate around the axis (x, y, z) by the angle
- X-axis rotation: positive X: pitch down, negative X: pitch up
- Y-axis rotation: positive Y: yaw left, negative Y: yaw right
- Z-axis rotation: positive Z: roll counter-clockwise, negative Z: roll clockwise
- e.g., 'rot y 25' rotating 25 degrees around y-axis (yaw left)
- e.g., 'rot x -30 10 40' rotating -30 degrees around x-axis (pitch up) from frame 10 to 40
- spiral: spiral motion, the camera will move in a spiral path with the given radius
- e.g., 'spiral 2' spiral motion with radius 2
- e.g., 'spiral 2 15 35' spiral motion with radius 2 from frame 15 to 35
Multiple transformations can be combined using semicolon (;) as separator:
- e.g., "trans 0 0 -0.5 0 30; rot x -25 0 30; trans -0.1 0 0 30 48"
This will:
1. Zoom in (z-0.5) from frame 0 to 30
2. Pitch up (rotate -25 degrees around x-axis) from frame 0 to 30
3. Move right (x-0.1) from frame 30 to 48
Notes:
- If start_frame and end_frame are not specified, the motion will be applied to all frames (0-48)
- Frames after end_frame will maintain the final transformation
- For combined transformations, they are applied in sequence
##### 2. Use a pose text file (pose txt)
```python
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
--output_dir \ # output directory
--input_path \ # the reference image or video path
--camera_motion "path" \ # if camera motion type is "path", --pose_file is needed
--pose_file \ # txt file of camera pose, Each line corresponds to one frame
--tracking_method \ # the tracking method (moge, DELTA). For image input, 'moge' is necessary.
--override_extrinsics \ # how to apply camera motion: "override" to replace original camera, "append" to build upon it
--video_length=97 \
--sample_size 512 896 \
--density 5 \ # Control the sparsity of tracking points
--gpu \ # the gpu id
```
##### 3. Input another video for extract camera pose
```python
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
--output_dir \ # output directory
--input_path \ # the reference image or video path
--camera_motion "path" \ # if camera motion type is "path", --pose_file is needed
--pose_file \ # "Pi3" automatically estimates the camera pose from this video file
--tracking_method \ # the tracking method (moge, DELTA). For image input, 'moge' is necessary.
--override_extrinsics \ # how to apply camera motion: "override" to replace original camera, "append" to build upon it
--video_length=97 \
--sample_size 512 896 \
--density 5 \ # Control the sparsity of tracking points
--gpu \ # the gpu id
```
#### 5. Object Manipulation

We provide several template object manipulation types, you can choose one of them. In practice, we find that providing a description of the object motion in prompt will get better results.
```python
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
--input_path \ # the reference image path
--object_motion \ # the object motion type (up, down, left, right)
--object_mask \ # the object mask path
--tracking_method \ # the tracking method (moge, DELTA). For image input, 'moge' is nesserary.
--sample_size 512 896 \
--video_length=49 \
--density 30 \
--gpu \ # the gpu id
```
It should be noted that depending on the tracker you choose, you may need to modify the scale of translation.
## 🙏 Acknowledgements
This project builds upon several excellent open source projects:
* [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun)
* [DELTA](https://github.com/snap-research/DELTA_densetrack3d)
* [MoGe](https://github.com/microsoft/MoGe)
* [vggt](https://github.com/facebookresearch/vggt)
* [Pi3](https://github.com/yyfz/Pi3)
We thank the authors and contributors of these projects for their valuable contributions to the open source community!
## 🌟 Citation
If you find FlexAM useful for your research, please cite our paper:
```
@misc{sheng2026FlexAM,
title={FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control},
author={Sheng, Mingzhi and Gu, Zekai and Li, Peng and Lin, Cheng and Guo, Hao-Xiang and Chen, Ying-Cong and Liu, Yuan},
year={2026},
eprint={2602.13185},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.13185},
}
```
## ⚖️ License
This model checkpoint is based on **FlexAM**.
- **Model Architecture / Code**: Licensed under **Apache 2.0** (or CC-BY-SA 4.0, consistent with your GitHub).
- **Embedded DELTA Weights**: This checkpoint contains weights from **DELTA (Snap Inc.)**, which are restricted to **Non-Commercial, Research-Only** use.
**⚠️ Usage Note:**
By downloading or using these weights, you agree to comply with the **Snap Inc. License** regarding the DELTA modules. Please refer to the [LICENSE](./LICENSE) file in this repository for the full text.