--- license: apache-2.0 datasets: - syxbb/SAMA-edit-filtered-1M pipeline_tag: video-to-video --- # SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing

## 🤗 Available Models | Model | Status | Link | | --- | --- | --- | | SAMA-5B | Coming soon | Coming soon | | SAMA-14B | Available | [syxbb/SAMA-14B](https://huggingface.co/syxbb/SAMA-14B) | ## 🚀 Quick Start This repository contains the weights of SAMA-14B. For more instructions about how to use the model, please refer to the [official GitHub repository](https://github.com/Cynthiazxy123/SAMA). ### Installation Recommended environment: - Linux - NVIDIA GPU - CUDA 12.1 or a compatible environment - Python 3.10 ```bash git clone https://github.com/Cynthiazxy123/SAMA cd SAMA conda create -n sama python=3.10 -y conda activate sama pip install --upgrade pip pip install -r requirements.txt ``` ### Inference Prepare: 1. The base `Wan2.1-T2V-14B` model directory. 2. A SAMA checkpoint from [Hugging Face](https://huggingface.co/syxbb/SAMA-14B). 3. A source video and an edit instruction. The inference script is: `infer_sh/run_sama.sh` Edit the variables at the top of that script before running: - `MODEL_ROOT` - `STATE_DICT` - `SRC_VIDEO` - `PROMPT` - `OUTPUT_DIR` Then run: ```bash bash infer_sh/run_sama.sh ``` The generated result will be saved to: ```text outputs/seed_1/ ``` A recommended local model layout is: ```text models/ ├── Wan2.1-T2V-14B/ │ ├── diffusion_pytorch_model-00001-of-00006.safetensors │ ├── diffusion_pytorch_model-00002-of-00006.safetensors │ ├── diffusion_pytorch_model-00003-of-00006.safetensors │ ├── diffusion_pytorch_model-00004-of-00006.safetensors │ ├── diffusion_pytorch_model-00005-of-00006.safetensors │ ├── diffusion_pytorch_model-00006-of-00006.safetensors │ ├── models_t5_umt5-xxl-enc-bf16.pth │ ├── Wan2.1_VAE.pth │ └── google/ └── SAMA-14B/ └── .safetensors ``` ### Notes - Input frames are automatically padded to satisfy the `4k+1` frame requirement used by Wan video inference. - The output video uses the source video FPS when available; otherwise it falls back to `--fps`. - If `--model-root` is incomplete, the script will stop and report the missing files or directories. ## 📚 Citation ```bibtex @misc{zhang2026samafactorizedsemanticanchoring, title={SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing}, author={Xinyao Zhang and Wenkai Dong and Yuxin Song and Bo Fang and Qi Zhang and Jing Wang and Fan Chen and Hui Zhang and Haocheng Feng and Yu Lu and Hang Zhou and Chun Yuan and Jingdong Wang}, year={2026}, eprint={2603.19228}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2603.19228}, } ```