| --- |
| pipeline_tag: video-to-video |
| license: apache-2.0 |
| language: |
| - en |
| base_model: |
| - Wan-AI/Wan2.1-T2V-1.3B |
| --- |
| |
| # DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory |
|
|
| We propose DecMem, a decoupled memory architecture that employs Sparse Global Memory for efficient fine-grained access to global history and Anchored Local Memory for stable and high-quality extrapolation. |
|
|
| [**Project Page**](https://jeffreyyzh.github.io/DecMem-Page/) | [**Paper**](https://arxiv.org/abs/2605.31336) | [**Code**](https://github.com/KlingAIResearch/DecMem) |
|
|
| ## Checkpoints |
|
|
| Download the Wan2.1 backbone (VAE + tokenizer weights used by the pipeline): |
|
|
| ```bash |
| huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B \ |
| --local-dir-use-symlinks False \ |
| --local-dir wan_models/Wan2.1-T2V-1.3B |
| ``` |
|
|
| Download DecMem trained checkpoints from HuggingFace: |
|
|
| ```bash |
| huggingface-cli download KlingTeam/DecMem --local-dir checkpoints |
| ``` |
|
|
| Checkpoint layout expected by training / inference scripts: |
|
|
| ``` |
| checkpoints/ |
| └── decmem.pt # released weights |
| ``` |
|
|
| ## Quick start |
|
|
| We provide the example video-pose pairs for quick inference. The inference is Block-by-block causal denoising manner with KV cache. |
|
|
| ```bash |
| bash scripts/infer_example.sh |
| ``` |
|
|
| ## Citation |
| If you find our work helpful, please cite our paper: |
|
|
| ```bibtex |
| @misc{yang2026decmemminutelongconsistentworld, |
| title={DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory}, |
| author={Zhenhao Yang and Xiaoshi Wu and Zhengyao Lv and Xiaoyu Shi and Xintao Wang and Pengfei Wan and Kun Gai and Kwan-Yee K. Wong}, |
| year={2026}, |
| eprint={2605.31336}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CV}, |
| url={https://arxiv.org/abs/2605.31336}, |
| } |
| ``` |