young98 commited on
Commit
2e09160
Β·
verified Β·
1 Parent(s): cdb4380

Upload 6 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/motivation.jpg filter=lfs diff=lfs merge=lfs -text
37
+ prompts/train/vidprom_filtered_extended_switch.txt filter=lfs diff=lfs merge=lfs -text
38
+ prompts/train/vidprom_filtered_extended.txt filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,113 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Anchor Forcing: Anchor Memory and Tri-Region RoPE for Interactive Streaming Video Diffusion
2
+
3
+ <a href="https://arxiv.org/abs/2603.13405"><img src='https://img.shields.io/badge/arXiv-2603.13405-red?style=flat&logo=arXiv&logoColor=red' alt='arxiv'></a>&nbsp;
4
+ <a href="https://vivocameraresearch.github.io/anchorforcing/"><img src='https://img.shields.io/badge/Project-Page-Green' alt='page'></a>&nbsp;
5
+ <a href="http://www.apache.org/licenses/LICENSE-2.0"><img src='https://img.shields.io/badge/License-CC BY--NC--SA--4.0-lightgreen?style=flat&logo=Lisence' alt='License'></a><br>
6
+
7
+ ![overview](assets/motivation.jpg)
8
+ <p>
9
+ πŸ“–<strong>TL;DR</strong>: <strong>Anchor Forcing</strong> enables prompt switches to introduce new subjects and actions while preserving context, motion quality, and temporal coherence; prior methods often degrade over time and miss newly specified interactions.
10
+ </p>
11
+
12
+ ## πŸ“’ News
13
+ - **[2026-03-18]** πŸŽ‰ We have officially released the code for public use!
14
+
15
+
16
+ ## βœ… ToDo List for Any-to-Bokeh Release
17
+
18
+ - [x] Release the code
19
+ - [x] Release the inference pipeline
20
+ - [x] Release the training files
21
+ - [x] Release the model weights
22
+
23
+ ## :wrench: Installation
24
+ We tested this repo on the following setup:
25
+ * Nvidia GPU with at least 40 GB memory (A100 tested).
26
+ * Linux operating system.
27
+ * 64 GB RAM.
28
+
29
+ Other hardware setup could also work but hasn't been tested.
30
+
31
+ **Environment**
32
+
33
+ Create a conda environment and install dependencies:
34
+ ```
35
+ git clone https://github.com/vivoCameraResearch/Anchor-Forcing.git
36
+ cd Anchor-Forcing
37
+ conda create -n af python=3.10 -y
38
+ conda activate af
39
+ pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
40
+ pip install -r requirements.txt
41
+ pip install flash-attn --no-build-isolation
42
+
43
+ # Manual installation flash-attention. Recommended version: 2.7.4.post1
44
+ https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
45
+ ```
46
+
47
+ ## ⏬ Demo Inference
48
+
49
+ **Download Wan2.1-T2V-1.3B**
50
+ ```
51
+ huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B
52
+ ```
53
+
54
+ **Download checkpoints**
55
+
56
+ ```
57
+ huggingface-cli download young98/AnchorForcing --local-dir ckpt
58
+ ```
59
+
60
+ **Single Prompt Video Generation**
61
+ ```
62
+ bash inference/inference.sh
63
+ ```
64
+ **Interactive Long Video Generation**
65
+ ```
66
+ bash inference/interactive_inference.py
67
+ ```
68
+
69
+
70
+ ## Training
71
+ **Download checkpoints**
72
+
73
+ Please follow [Self-Forcing](https://github.com/guandeh17/Self-Forcing) to download text prompts and ODE initialized checkpoint.
74
+
75
+ Download Wan2.1-T2V-14B as the teacher model.
76
+
77
+ ```
78
+ huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir wan_models/Wan2.1-T2V-14B
79
+ ```
80
+
81
+ **Step1: Self-Forcing Initialization for Short Window and Frame Sink**
82
+
83
+ Please follow [LongLive](https://nvlabs.github.io/LongLive/docs/#training:~:text=Step1%3A%20Self%2DForcing%20Initialization%20for%20Short%20Window%20and%20Frame%20Sink)
84
+
85
+ **Step2: Streaming Long Tuning**
86
+ ```
87
+ bash train.sh
88
+ ```
89
+
90
+ **Hints**
91
+
92
+ This repository only provides the training code for step 2. We default to following the training method of LongLive's step 1. Therefore, you can directly train step 2 using LongLive's checkpoints.
93
+
94
+ ## πŸ“œ Acknowledgement
95
+ This codebase builds on [LongLive](https://github.com/NVlabs/LongLive). Thanks for open-sourcing! Besides, we acknowledge following great open-sourcing projects:
96
+ - [MemFlow](https://github.com/KlingAIResearch/MemFlow): We followed its interactive video benchmark.
97
+ - [Self-Forcing](https://github.com/guandeh17/Self-Forcing): We followed its vbench prompt and checkpoints.
98
+
99
+
100
+ ## 🌏 Citation
101
+
102
+ ```bibtex
103
+ @article{yang2026anchor,
104
+ title={Anchor Forcing: Anchor Memory and Tri-Region RoPE for Interactive Streaming Video Diffusion},
105
+ author={Yang, Yang and Zhang, Tianyi and Huang, Wei and Chen, Jinwei and Wu, Boxi and He, Xiaofei and Cai, Deng and Li, Bo and Jiang, Peng-Tao},
106
+ journal={arXiv preprint arXiv:2603.13405},
107
+ year={2026}
108
+ }
109
+ ```
110
+
111
+ ## πŸ“§ Contact
112
+
113
+ If you have any questions and improvement suggestions, please email Yang Yang (yangyang98@zju.edu.cn), or open an issue.
assets/motivation.jpg ADDED

Git LFS Details

  • SHA256: adc371456571e1dead0b4ce2ad16f2f4175c1a576d7b7b9cdcefa50bb4a8f3ee
  • Pointer size: 131 Bytes
  • Size of remote file: 166 kB
prompts/bench/interactive_benchmark.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
prompts/bench/vbench_all_dimension_extended.txt ADDED
The diff for this file is too large to render. See raw diff
 
prompts/train/vidprom_filtered_extended.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7896742f468bc8aef9e4547424d1ce0a951acdb2a82233790155401a99bf5aa5
3
+ size 145875068
prompts/train/vidprom_filtered_extended_switch.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ddfbb03ec4e5919edadccc9cce9d21c4d5e4ffb0df816ee7725c92ff80c7e63
3
+ size 130974588