metadata
license: apache-2.0
datasets:
- syxbb/SAMA-edit-filtered-1M
pipeline_tag: video-to-video
SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing
π€ Available Models
| Model | Status | Link |
|---|---|---|
| SAMA-5B | Coming soon | Coming soon |
| SAMA-14B | Available | syxbb/SAMA-14B |
π Quick Start
This repository contains the weights of SAMA-14B. For more instructions about how to use the model, please refer to the official GitHub repository.
Installation
Recommended environment:
- Linux
- NVIDIA GPU
- CUDA 12.1 or a compatible environment
- Python 3.10
git clone https://github.com/Cynthiazxy123/SAMA
cd SAMA
conda create -n sama python=3.10 -y
conda activate sama
pip install --upgrade pip
pip install -r requirements.txt
Inference
Prepare:
- The base
Wan2.1-T2V-14Bmodel directory. - A SAMA checkpoint from Hugging Face.
- A source video and an edit instruction.
The inference script is:
infer_sh/run_sama.sh
Edit the variables at the top of that script before running:
MODEL_ROOTSTATE_DICTSRC_VIDEOPROMPTOUTPUT_DIR
Then run:
bash infer_sh/run_sama.sh
The generated result will be saved to:
outputs/seed_1/<input_video_filename>
A recommended local model layout is:
models/
βββ Wan2.1-T2V-14B/
β βββ diffusion_pytorch_model-00001-of-00006.safetensors
β βββ diffusion_pytorch_model-00002-of-00006.safetensors
β βββ diffusion_pytorch_model-00003-of-00006.safetensors
β βββ diffusion_pytorch_model-00004-of-00006.safetensors
β βββ diffusion_pytorch_model-00005-of-00006.safetensors
β βββ diffusion_pytorch_model-00006-of-00006.safetensors
β βββ models_t5_umt5-xxl-enc-bf16.pth
β βββ Wan2.1_VAE.pth
β βββ google/
βββ SAMA-14B/
βββ <downloaded_checkpoint>.safetensors
Notes
- Input frames are automatically padded to satisfy the
4k+1frame requirement used by Wan video inference. - The output video uses the source video FPS when available; otherwise it falls back to
--fps. - If
--model-rootis incomplete, the script will stop and report the missing files or directories.
π Citation
@misc{zhang2026samafactorizedsemanticanchoring,
title={SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing},
author={Xinyao Zhang and Wenkai Dong and Yuxin Song and Bo Fang and Qi Zhang and Jing Wang and Fan Chen and Hui Zhang and Haocheng Feng and Yu Lu and Hang Zhou and Chun Yuan and Jingdong Wang},
year={2026},
eprint={2603.19228},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.19228},
}