SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing
Paper β’ 2603.19228 β’ Published β’ 68
This repository provides an FP8 quantized version of syxbb/SAMA-14B.
SAMA (factorized Semantic Anchoring and Motion Alignment) is a framework that factorizes instruction-guided video editing into semantic anchoring and motion modeling. It improves edit precision while preserving temporal dynamics from the source video by establishing reliable visual anchors through sparse frame prediction.
git clone https://github.com/Cynthiazxy123/SAMA
cd SAMA
conda create -n sama python=3.10 -y
conda activate sama
pip install --upgrade pip
pip install -r requirements.txt
To run inference, you need to prepare the base Wan2.1-T2V-14B model directory and the SAMA checkpoint. Edit the variables in the provided script (e.g., MODEL_ROOT, STATE_DICT, SRC_VIDEO, PROMPT) and run:
bash infer_sh/run_sama.sh
@misc{zhang2026samafactorizedsemanticanchoring,
title={SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing},
author={Xinyao Zhang and Wenkai Dong and Yuxin Song and Bo Fang and Qi Zhang and Jing Wang and Fan Chen and Hui Zhang and Haocheng Feng and Yu Lu and Hang Zhou and Chun Yuan and Jingdong Wang},
year={2026},
eprint={2603.19228},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.19228},
}
Credit for the original model goes to syxbb/SAMA-14B.
Base model
syxbb/SAMA-14B