SandwichZ's picture
Update README.md
aeae87e verified
---
license: other
library_name: diffusers
tags:
- motion-transfer
- comfyui
- video-generation
- image-to-video
- comfyui
- video-edit
pipeline_tag: video-to-video
base_model:
- alibaba-pai/Wan2.2-Fun-5B-Control
---
<div align="center">
# FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control
<a href="https://arxiv.org/abs/2602.13185"><img src="https://img.shields.io/badge/arXiv-2602.13185-b31b1b.svg" alt="arXiv"></a>
<a href="https://github.com/IGL-HKUST/FlexAM"><img src="https://img.shields.io/badge/GitHub-Repository-181717.svg?logo=github&logoColor=white" alt="GitHub"></a>
<a href="assets/flexam_workflow.json"><img src="https://img.shields.io/badge/ComfyUI-Download_Workflow-4fd63d" alt="ComfyUI"></a>
<br>
<br>
Mingzhi Sheng<sup>1*</sup>, Zekai Gu<sup>2*</sup>, Peng Li<sup>2</sup>, Cheng Lin<sup>3</sup>, Hao-Xiang Guo<sup>4</sup>, Ying-Cong Chen<sup>1,2†</sup>, Yuan Liu<sup>2†</sup>
<br>
<sup>1</sup>HKUST(GZ), <sup>2</sup>HKUST, <sup>3</sup>MUST, <sup>4</sup>Tsinghua University
<br>
<small><sup>*</sup>Equal Contribution, <sup></sup>Corresponding Authors</small>
</div>
<br>
![teaser](assets/teaser.gif)
## 📰 News
- **[2026.02.14]** 📄 The paper is available on arXiv.
- **[2026.02.13]** 🚀 We have released the inference code and **ComfyUI** support!
## 🛠️ Installation
> 📢 **System Requirements**: Both the official Python inference code and the ComfyUI workflow were tested on **Ubuntu 20.04** with **Python 3.10**, **PyTorch 2.5.1**, and **CUDA 12.1** on an **NVIDIA A800** GPU.
Before running any inference (Python or ComfyUI), please setup the environment and download the checkpoints.
### 1. Create environment
Clone the repository and create conda environment:
```
git clone https://github.com/IGL-HKUST/FlexAM
conda create -n flexam python=3.10
conda activate flexam
```
Install pytorch, we recommend `Pytorch 2.5.1` with `CUDA 12.1`:
```
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu121
```
```
pip install -r requirements.txt
```
### 2. Download Submodules
We rely on several external modules (MoGe, Pi3, etc.).
```
mkdir -p submodules
git submodule update --init --recursive
pip install -r requirements.txt
```
<details>
<summary><em>(Optional) Manual clone if submodule update fails</em></summary>
```
# DELTA
git clone https://github.com/snap-research/DELTA_densetrack3d.git submodules/MoGe
# Pi3
git clone https://github.com/yyfz/Pi3.git submodules/Pi3
# MoGe
git clone https://github.com/microsoft/MoGe.git submodules/MoGe
# VGGT
git clone https://github.com/facebookresearch/vggt.git submodules/vggt
```
</details>
### 3. Download checkpoints
Download the FlexAM checkpoint and place it in the`checkpoints/` directory.
- HuggingFace Link: [Wan2.2-Fun-5B-FLEXAM](https://huggingface.co/SandwichZ/Wan2.2-Fun-5B-FLEXAM)
## 🚀 Inference
We provide two ways to use FlexAM: Python Script and ComfyUI.
### Option A: ComfyUI Integration
We provide a native node for seamless integration into ComfyUI workflows.
> ⚠️ **Note**: Currently, the ComfyUI node supports **Motion Transfer**, **Foreground Edit**, and **Background Edit**. For *Camera Control* and *Object Manipulation*, please use the Python script.
#### 1. Install Node
Since we are not yet in the Manager, please install manually:
```
cd ComfyUI/custom_nodes/
git clone https://github.com/IGL-HKUST/FlexAM
cd FlexAM
pip install -r requirements.txt
```
#### 2. Run Workflow
- Step 1: Download the workflow JSON: [workflow.json](assets/flexam_workflow.json)
- Step 2: Drag and drop it into ComfyUI.
- Step 3: Ensure checkpoints are in `ComfyUI/models/checkpoints`.
### Option B: Python Script
We provide a inference script for our tasks. Please refer to `run_demo.sh` to run the `demo.py` script.
Or you can run these tasks one by one as follows.
#### 1. Motion Transfer
![motion_transfer](assets/motion_transfer.gif)
---
```python
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path <model_path> \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
--output_dir <output_dir> \ # output directory
--input_path <input_path> \ # the reference video path
--repaint <True/repaint_path > \ # the repaint first frame image path of input source video or use FLUX to repaint the first frame \
--video_length=97 \
--sample_size 512 896 \
--generate_type='full_edit' \
--density 10 \ # Control the sparsity of tracking points
--gpu <gpu_id> \ # the gpu id
```
#### 2. foreground edit
![fg_edit](assets/fg_edit.gif)
```python
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path <model_path> \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
--output_dir <output_dir> \ # output directory
--input_path <input_path> \ # the reference video path
--repaint <True/repaint_path > \ # the repaint first frame image path of input source video or use FLUX to repaint the first frame \
--mask_path <mask_path> \ # White (255) represents the foreground to be edited, and black (0) remains unchanged
--video_length=97 \
--sample_size 512 896 \
--generate_type='foreground_edit' \
--dilation_pixels=30 \ # Dilation pixels for mask processing in foreground_edit mode
--density 10 \ # Control the sparsity of tracking points
--gpu <gpu_id> \ # the gpu id
```
#### 3. background edit
![bg_edit](assets/bg_edit.gif)
```python
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path <model_path> \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
--output_dir <output_dir> \ # output directory
--input_path <input_path> \ # the reference video path
--repaint <True/repaint_path > \ # the repaint first frame image path of input source video or use FLUX to repaint the first frame \
--mask_path <mask_path> \ # White (255) represents the unchanged foreground, while the background indicates the area to be edited
--video_length=97 \
--sample_size 512 896 \
--generate_type='background_edit' \
--density 10 \ # Control the sparsity of tracking points
--gpu <gpu_id> \ # the gpu id
```
#### 4. Camera Control
![camera_ctrl](assets/camera_ctrl.gif)
We provide three camera control methods: 1. Use predefined templates; 2. Use a pose text file (pose txt); 3. Input another video, where the "Pi3" automatically estimates the camera pose from it and applies it to the video to be generated.
##### 1. Use predefined templates
We provide several template camera motion types, you can choose one of them. In practice, we find that providing a description of the camera motion in prompt will get better results.
```python
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path <model_path> \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
--output_dir <output_dir> \ # output directory
--input_path <input_path> \ # the reference image or video path
--camera_motion <camera_motion> \ # the camera motion type, see examples below
--tracking_method <tracking_method> \ # the tracking method (moge, DELTA). For image input, 'moge' is necessary.
--override_extrinsics <override/append> \ # how to apply camera motion: "override" to replace original camera, "append" to build upon it
--video_length=97 \
--sample_size 512 896 \
--density 5 \ # Control the sparsity of tracking points
--gpu <gpu_id> \ # the gpu id
```
Here are some tips for camera motion:
- trans: translation motion, the camera will move in the direction of the vector (dx, dy, dz) with range [-1, 1]
- Positive X: Move left, Negative X: Move right
- Positive Y: Move down, Negative Y: Move up
- Positive Z: Zoom in, Negative Z: Zoom out
- e.g., 'trans -0.1 -0.1 -0.1' moving right, down and zoom in
- e.g., 'trans -0.1 0.0 0.0 5 45' moving right 0.1 from frame 5 to 45
- rot: rotation motion, the camera will rotate around the axis (x, y, z) by the angle
- X-axis rotation: positive X: pitch down, negative X: pitch up
- Y-axis rotation: positive Y: yaw left, negative Y: yaw right
- Z-axis rotation: positive Z: roll counter-clockwise, negative Z: roll clockwise
- e.g., 'rot y 25' rotating 25 degrees around y-axis (yaw left)
- e.g., 'rot x -30 10 40' rotating -30 degrees around x-axis (pitch up) from frame 10 to 40
- spiral: spiral motion, the camera will move in a spiral path with the given radius
- e.g., 'spiral 2' spiral motion with radius 2
- e.g., 'spiral 2 15 35' spiral motion with radius 2 from frame 15 to 35
Multiple transformations can be combined using semicolon (;) as separator:
- e.g., "trans 0 0 -0.5 0 30; rot x -25 0 30; trans -0.1 0 0 30 48"
This will:
1. Zoom in (z-0.5) from frame 0 to 30
2. Pitch up (rotate -25 degrees around x-axis) from frame 0 to 30
3. Move right (x-0.1) from frame 30 to 48
Notes:
- If start_frame and end_frame are not specified, the motion will be applied to all frames (0-48)
- Frames after end_frame will maintain the final transformation
- For combined transformations, they are applied in sequence
##### 2. Use a pose text file (pose txt)
```python
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path <model_path> \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
--output_dir <output_dir> \ # output directory
--input_path <input_path> \ # the reference image or video path
--camera_motion "path" \ # if camera motion type is "path", --pose_file is needed
--pose_file <pose_file_txt> \ # txt file of camera pose, Each line corresponds to one frame
--tracking_method <tracking_method> \ # the tracking method (moge, DELTA). For image input, 'moge' is necessary.
--override_extrinsics <override/append> \ # how to apply camera motion: "override" to replace original camera, "append" to build upon it
--video_length=97 \
--sample_size 512 896 \
--density 5 \ # Control the sparsity of tracking points
--gpu <gpu_id> \ # the gpu id
```
##### 3. Input another video for extract camera pose
```python
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path <model_path> \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
--output_dir <output_dir> \ # output directory
--input_path <input_path> \ # the reference image or video path
--camera_motion "path" \ # if camera motion type is "path", --pose_file is needed
--pose_file <pose_file_mp4> \ # "Pi3" automatically estimates the camera pose from this video file
--tracking_method <tracking_method> \ # the tracking method (moge, DELTA). For image input, 'moge' is necessary.
--override_extrinsics <override/append> \ # how to apply camera motion: "override" to replace original camera, "append" to build upon it
--video_length=97 \
--sample_size 512 896 \
--density 5 \ # Control the sparsity of tracking points
--gpu <gpu_id> \ # the gpu id
```
#### 5. Object Manipulation
![object](assets/object.gif)
We provide several template object manipulation types, you can choose one of them. In practice, we find that providing a description of the object motion in prompt will get better results.
```python
python demo.py \
--prompt <"prompt text"> \ # prompt text
--checkpoint_path <model_path> \ # FlexAM checkpoint path (e.g checkpoints/Diffusion_Transformer/Wan2.2-Fun-5B-FLEXAM)
--input_path <input_path> \ # the reference image path
--object_motion <object_motion> \ # the object motion type (up, down, left, right)
--object_mask <object_mask_path> \ # the object mask path
--tracking_method <tracking_method> \ # the tracking method (moge, DELTA). For image input, 'moge' is nesserary.
--sample_size 512 896 \
--video_length=49 \
--density 30 \
--gpu <gpu_id> \ # the gpu id
```
It should be noted that depending on the tracker you choose, you may need to modify the scale of translation.
## 🙏 Acknowledgements
This project builds upon several excellent open source projects:
* [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun)
* [DELTA](https://github.com/snap-research/DELTA_densetrack3d)
* [MoGe](https://github.com/microsoft/MoGe)
* [vggt](https://github.com/facebookresearch/vggt)
* [Pi3](https://github.com/yyfz/Pi3)
We thank the authors and contributors of these projects for their valuable contributions to the open source community!
## 🌟 Citation
If you find FlexAM useful for your research, please cite our paper:
```
@misc{sheng2026FlexAM,
title={FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control},
author={Sheng, Mingzhi and Gu, Zekai and Li, Peng and Lin, Cheng and Guo, Hao-Xiang and Chen, Ying-Cong and Liu, Yuan},
year={2026},
eprint={2602.13185},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.13185},
}
```
## ⚖️ License
This model checkpoint is based on **FlexAM**.
- **Model Architecture / Code**: Licensed under **Apache 2.0** (or CC-BY-SA 4.0, consistent with your GitHub).
- **Embedded DELTA Weights**: This checkpoint contains weights from **DELTA (Snap Inc.)**, which are restricted to **Non-Commercial, Research-Only** use.
**⚠️ Usage Note:**
By downloading or using these weights, you agree to comply with the **Snap Inc. License** regarding the DELTA modules. Please refer to the [LICENSE](./LICENSE) file in this repository for the full text.