Video-to-Video
English
File size: 2,067 Bytes
8c6f752
6725228
 
336537d
 
 
 
8c6f752
6725228
 
 
336537d
6725228
 
 
 
 
 
 
 
 
 
 
 
 
336537d
6725228
 
 
 
 
 
 
 
 
 
 
 
 
 
336537d
 
 
6725228
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
base_model:
- Wan-AI/Wan2.1-T2V-1.3B
language:
- en
license: apache-2.0
pipeline_tag: text-to-video
---

# DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

DecMem is a decoupled memory architecture designed for consistent, long-horizon world generation. It employs **Sparse Global Memory** for efficient fine-grained access to global history and **Anchored Local Memory** for stable and high-quality extrapolation. This approach enables minute-level controllable long video generation with high fidelity and consistency.

[**Project Page**](https://jeffreyyzh.github.io/DecMem-Page/) | [**Paper**](https://arxiv.org/abs/2605.31336) | [**Code**](https://github.com/KlingAIResearch/DecMem)

## Checkpoints

Download the Wan2.1 backbone (VAE + tokenizer weights used by the pipeline):

```bash
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B \
    --local-dir-use-symlinks False \
    --local-dir wan_models/Wan2.1-T2V-1.3B
```

Download DecMem trained checkpoints:

```bash
huggingface-cli download KlingTeam/DecMem --local-dir checkpoints
```

Checkpoint layout expected by training / inference scripts:

```
checkpoints/
└── decmem.pt             # released weights
```

## Quick start

We provide example video-pose pairs for quick inference. The inference is performed in a block-by-block causal denoising manner with KV cache.

To run the inference, follow the installation instructions in the [official repository](https://github.com/KlingAIResearch/DecMem) and run:

```bash
bash scripts/infer_example.sh
```

## Citation
If you find our work helpful, please cite our paper:

```bibtex
@misc{yang2026decmemminutelongconsistentworld,
      title={DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory}, 
      author={Zhenhao Yang and Xiaoshi Wu and Zhengyao Lv and Xiaoyu Shi and Xintao Wang and Pengfei Wan and Kun Gai and Kwan-Yee K. Wong},
      year={2026},
      eprint={2605.31336},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.31336}, 
}
```