File size: 4,243 Bytes
2e09160
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
# Anchor Forcing: Anchor Memory and Tri-Region RoPE for Interactive Streaming Video Diffusion

<a href="https://arxiv.org/abs/2603.13405"><img src='https://img.shields.io/badge/arXiv-2603.13405-red?style=flat&logo=arXiv&logoColor=red' alt='arxiv'></a>&nbsp;
<a href="https://vivocameraresearch.github.io/anchorforcing/"><img src='https://img.shields.io/badge/Project-Page-Green' alt='page'></a>&nbsp;
<a href="http://www.apache.org/licenses/LICENSE-2.0"><img src='https://img.shields.io/badge/License-CC BY--NC--SA--4.0-lightgreen?style=flat&logo=Lisence' alt='License'></a><br>

![overview](assets/motivation.jpg)
<p>
      πŸ“–<strong>TL;DR</strong>: <strong>Anchor Forcing</strong>  enables prompt switches to introduce new subjects and actions while preserving context, motion quality, and temporal coherence; prior methods often degrade over time and miss newly specified interactions.

</p>


## πŸ“’ News
- **[2026-03-18]** πŸŽ‰ We have officially released the code for public use!  


## βœ… ToDo List for Any-to-Bokeh Release

- [x] Release the code
- [x] Release the inference pipeline
- [x] Release the training files
- [x] Release the model weights

## :wrench: Installation
We tested this repo on the following setup:
* Nvidia GPU with at least 40 GB memory (A100 tested).
* Linux operating system.
* 64 GB RAM.

Other hardware setup could also work but hasn't been tested.

**Environment**

Create a conda environment and install dependencies:
```

git clone https://github.com/vivoCameraResearch/Anchor-Forcing.git

cd Anchor-Forcing

conda create -n af python=3.10 -y

conda activate af

pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124

pip install -r requirements.txt

pip install flash-attn --no-build-isolation



# Manual installation flash-attention. Recommended version: 2.7.4.post1

https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

```

## ⏬ Demo Inference 

**Download Wan2.1-T2V-1.3B**
```

huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B

```

**Download checkpoints**

```

huggingface-cli download young98/AnchorForcing --local-dir ckpt

```

**Single Prompt Video Generation**
```

bash inference/inference.sh

```
**Interactive Long Video Generation**
```

bash inference/interactive_inference.py

```


## Training
**Download checkpoints**

Please follow [Self-Forcing](https://github.com/guandeh17/Self-Forcing) to download text prompts and ODE initialized checkpoint.

Download Wan2.1-T2V-14B as the teacher model.

```

huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir wan_models/Wan2.1-T2V-14B

```

**Step1: Self-Forcing Initialization for Short Window and Frame Sink**

Please follow [LongLive](https://nvlabs.github.io/LongLive/docs/#training:~:text=Step1%3A%20Self%2DForcing%20Initialization%20for%20Short%20Window%20and%20Frame%20Sink) 

**Step2: Streaming Long Tuning**
```

bash train.sh

```

**Hints**

This repository only provides the training code for step 2. We default to following the training method of LongLive's step 1. Therefore, you can directly train step 2 using LongLive's checkpoints.

## πŸ“œ Acknowledgement
This codebase builds on [LongLive](https://github.com/NVlabs/LongLive). Thanks for open-sourcing! Besides, we acknowledge following great open-sourcing projects:
- [MemFlow](https://github.com/KlingAIResearch/MemFlow): We followed its interactive video benchmark.
- [Self-Forcing](https://github.com/guandeh17/Self-Forcing): We followed its vbench prompt and checkpoints.


## 🌏 Citation

```bibtex

@article{yang2026anchor,

  title={Anchor Forcing: Anchor Memory and Tri-Region RoPE for Interactive Streaming Video Diffusion},

  author={Yang, Yang and Zhang, Tianyi and Huang, Wei and Chen, Jinwei and Wu, Boxi and He, Xiaofei and Cai, Deng and Li, Bo and Jiang, Peng-Tao},

  journal={arXiv preprint arXiv:2603.13405},

  year={2026}

}

```

## πŸ“§ Contact

If you have any questions and improvement suggestions, please email Yang Yang (yangyang98@zju.edu.cn), or open an issue.