---
license: apache-2.0
datasets:
- syxbb/SAMA-edit-filtered-1M
pipeline_tag: video-to-video
---
# SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing
## 🤗 Available Models
| Model | Status | Link |
| --- | --- | --- |
| SAMA-5B | Coming soon | Coming soon |
| SAMA-14B | Available | [syxbb/SAMA-14B](https://huggingface.co/syxbb/SAMA-14B) |
## 🚀 Quick Start
This repository contains the weights of SAMA-14B. For more instructions about how to use the model, please refer to the [official GitHub repository](https://github.com/Cynthiazxy123/SAMA).
### Installation
Recommended environment:
- Linux
- NVIDIA GPU
- CUDA 12.1 or a compatible environment
- Python 3.10
```bash
git clone https://github.com/Cynthiazxy123/SAMA
cd SAMA
conda create -n sama python=3.10 -y
conda activate sama
pip install --upgrade pip
pip install -r requirements.txt
```
### Inference
Prepare:
1. The base `Wan2.1-T2V-14B` model directory.
2. A SAMA checkpoint from [Hugging Face](https://huggingface.co/syxbb/SAMA-14B).
3. A source video and an edit instruction.
The inference script is:
`infer_sh/run_sama.sh`
Edit the variables at the top of that script before running:
- `MODEL_ROOT`
- `STATE_DICT`
- `SRC_VIDEO`
- `PROMPT`
- `OUTPUT_DIR`
Then run:
```bash
bash infer_sh/run_sama.sh
```
The generated result will be saved to:
```text
outputs/seed_1/
```
A recommended local model layout is:
```text
models/
├── Wan2.1-T2V-14B/
│ ├── diffusion_pytorch_model-00001-of-00006.safetensors
│ ├── diffusion_pytorch_model-00002-of-00006.safetensors
│ ├── diffusion_pytorch_model-00003-of-00006.safetensors
│ ├── diffusion_pytorch_model-00004-of-00006.safetensors
│ ├── diffusion_pytorch_model-00005-of-00006.safetensors
│ ├── diffusion_pytorch_model-00006-of-00006.safetensors
│ ├── models_t5_umt5-xxl-enc-bf16.pth
│ ├── Wan2.1_VAE.pth
│ └── google/
└── SAMA-14B/
└── .safetensors
```
### Notes
- Input frames are automatically padded to satisfy the `4k+1` frame requirement used by Wan video inference.
- The output video uses the source video FPS when available; otherwise it falls back to `--fps`.
- If `--model-root` is incomplete, the script will stop and report the missing files or directories.
## 📚 Citation
```bibtex
@misc{zhang2026samafactorizedsemanticanchoring,
title={SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing},
author={Xinyao Zhang and Wenkai Dong and Yuxin Song and Bo Fang and Qi Zhang and Jing Wang and Fan Chen and Hui Zhang and Haocheng Feng and Yu Lu and Hang Zhou and Chun Yuan and Jingdong Wang},
year={2026},
eprint={2603.19228},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.19228},
}
```