--- pipeline_tag: video-to-video license: apache-2.0 language: - en base_model: - Wan-AI/Wan2.1-T2V-1.3B --- # DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory We propose DecMem, a decoupled memory architecture that employs Sparse Global Memory for efficient fine-grained access to global history and Anchored Local Memory for stable and high-quality extrapolation. [**Project Page**](https://jeffreyyzh.github.io/DecMem-Page/) | [**Paper**](https://arxiv.org/abs/2605.31336) | [**Code**](https://github.com/KlingAIResearch/DecMem) ## Checkpoints Download the Wan2.1 backbone (VAE + tokenizer weights used by the pipeline): ```bash huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B \ --local-dir-use-symlinks False \ --local-dir wan_models/Wan2.1-T2V-1.3B ``` Download DecMem trained checkpoints from HuggingFace: ```bash huggingface-cli download KlingTeam/DecMem --local-dir checkpoints ``` Checkpoint layout expected by training / inference scripts: ``` checkpoints/ └── decmem.pt # released weights ``` ## Quick start We provide the example video-pose pairs for quick inference. The inference is Block-by-block causal denoising manner with KV cache. ```bash bash scripts/infer_example.sh ``` ## Citation If you find our work helpful, please cite our paper: ```bibtex @misc{yang2026decmemminutelongconsistentworld, title={DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory}, author={Zhenhao Yang and Xiaoshi Wu and Zhengyao Lv and Xiaoyu Shi and Xintao Wang and Pengfei Wan and Kun Gai and Kwan-Yee K. Wong}, year={2026}, eprint={2605.31336}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2605.31336}, } ```