File size: 6,537 Bytes
b794eb2 82a4274 b794eb2 82a4274 b794eb2 82a4274 b794eb2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | <div align="center">
# Echo-Infinity
### Learnable Evolving Memory for Real-Time Infinite Video Generation
<h3 align="center"><a href="https://arxiv.org/abs/2606.04527">Paper</a> | <a href="https://echo-team-joy-future-academy-jd.github.io/Echo-Infinity/">Website</a> | <a href="https://huggingface.co/Echo-Team/Echo-Infinity">Models</a> | <a href="https://github.com/Echo-Team-Joy-Future-Academy-JD/Echo-Infinity">Code</a></h3>
</div>
-----
Echo-Infinity demonstrates hour-scale and real-time video generation with a learnable memory to filter, abstract, and compress any-length history at constant cost, suggesting a practical path toward infinite video generation.
-----
<table align="center">
<tr>
<td align="center" width="50%">
<a href="https://www.youtube.com/watch?v=YR7G_yJs8WM">
<img src="https://img.youtube.com/vi/YR7G_yJs8WM/hqdefault.jpg" alt="24h Demo β Part 1 / 2" width="100%"/>
<br/>
<sub><b>24h Demo β Part 1 / 2</b></sub>
</a>
</td>
<td align="center" width="50%">
<a href="https://www.youtube.com/watch?v=kF2Nksvijb8">
<img src="https://img.youtube.com/vi/kF2Nksvijb8/hqdefault.jpg" alt="24h Demo β Part 2 / 2" width="100%"/>
<br/>
<sub><b>24h Demo β Part 2 / 2</b></sub>
</a>
</td>
</tr>
</table>
<p align="center">
<sub><i>Note: Each 24-hour demo is too large to host inline, so it is only viewable via YouTube. Each clip is split into two consecutive 12-hour parts due to YouTube's per-video duration limit, and visual quality has been moderately compressed for upload bandwidth efficiency.</i></sub>
</p>
## π₯ News
- **2026.6.03**: The [paper](https://arxiv.org/abs/2606.04527), [project page](https://echo-team-joy-future-academy-jd.github.io/Echo-Infinity/), [model](https://huggingface.co/Echo-Team/Echo-Infinity), and [code](https://github.com/Echo-Team-Joy-Future-Academy-JD/Echo-Infinity) are released.
## Quick Start
### Installation
```bash
conda create -n echo_infinity python=3.10 -y
conda activate echo_infinity
cd Echo-Infinity
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
python setup.py develop
```
### Download Checkpoints
```bash
# Wan2.1 base models (teacher / student backbones)
hf download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B
hf download Wan-AI/Wan2.1-T2V-14B --local-dir wan_models/Wan2.1-T2V-14B
# Stage-2 (Causal ODE) init from upstream Causal-Forcing
hf download zhuhz22/Causal-Forcing chunkwise/causal_forcing.pt --local-dir checkpoints
# Echo-Infinity Stage-1 (init) and Stage-2 (long) checkpoints
hf download Echo-Team/Echo-Infinity echo_infinity.pt --local-dir checkpoints
hf download Echo-Team/Echo-Infinity echo_infinity-long.pt --local-dir checkpoints
```
### CLI Inference
All commands assume cwd = `Echo-Infinity/`.
**5s β short video** (single-prompt, EMA on):
```bash
CUDA_VISIBLE_DEVICES=0 python inference/inference.py \
--config_path configs/echo_infinity_inference_std.yaml \
--use_ema \
--output_folder output/5s \
--seed 0
```
**30s β mid-length video** (single-prompt):
```bash
CUDA_VISIBLE_DEVICES=0 python inference/inference.py \
--config_path configs/echo_infinity-long_inference.yaml \
--output_folder output/30s \
--seed 0
```
**240s β long video** (single-prompt):
```bash
CUDA_VISIBLE_DEVICES=0 python inference/inference.py \
--config_path configs/echo_infinity-long_inference_240s.yaml \
--output_folder output/240s \
--seed 0
```
**60s interactive** (multi-prompt switching within one video):
```bash
CUDA_VISIBLE_DEVICES=0 python inference/interactive_inference.py \
--config_path configs/echo_infinity-long_interactive.yaml \
--output_folder output/60s_interactive \
--seed 1
```
**1h β hour-level video** (streaming decode):
```bash
bash inference/stream_long/run_1h.sh
```
**24h β full-day video** (streaming decode):
```bash
bash inference/stream_long/run_24h.sh
```
Prompts are under `inference/prompts/demo_*.txt` and switch-prompts at `inference/prompts/demo_60s_interactive.jsonl`. Override with `--data_path your_prompts.txt`.
## Training
The pipeline has two stages of DMD training. Both are launched on 4 nodes Γ 8 GPUs by default (`gradient_accumulation_steps=2`, effective batch size 64). Override the launch topology via `MASTER_ADDR`, `NODE_IP_*`, `NNODES`, and `NPROC_PER_NODE` environment variables (e.g. `NNODES=1 NPROC_PER_NODE=8 bash scripts/train_echo_infinity_init.sh` for single-node training).
Weights & Biases logging is **off by default**. To enable it, set `USE_WANDB=1` and fill in `wandb_key` / `wandb_entity` in the corresponding config (`configs/echo_infinity.yaml`, `configs/echo_infinity-long.yaml`).
### Stage 1 β Init
```bash
bash scripts/train_echo_infinity_init.sh
```
Output: `logs/echo_infinity/checkpoint_model_000400/model.pt`. To reuse it as the Stage-2 init or for inference, copy it to `checkpoints/echo_infinity.pt` (the path the configs expect) or pass `--checkpoint_path`.
### Stage 2 β Long-Video Tuning
```bash
bash scripts/train_echo_infinity_long.sh
```
Output: `logs/echo_infinity-long/checkpoint_model_003200/model.pt`. To reuse it for the long-form inference above, copy it to `checkpoints/echo_infinity-long.pt` (the `lora_ckpt` path the configs expect) or pass `--lora_ckpt`.
Training data (download from the same HF repo as the model weights):
```bash
hf download Echo-Team/Echo-Infinity vidprom_filtered_extended.txt --local-dir prompts
hf download Echo-Team/Echo-Infinity vidprom_filtered_extended_switch.txt --local-dir prompts
```
- `prompts/vidprom_filtered_extended.txt` β base prompts for streaming training
- `prompts/vidprom_filtered_extended_switch.txt` β prompt-switch pairs for interactive training
## Acknowledgements
This codebase builds on the open-source implementations of:
- [Wan2.1 (Wan-Video)](https://github.com/Wan-Video/Wan2.1)
- [Causal-Forcing (thu-ml)](https://github.com/thu-ml/Causal-Forcing)
- [LongLive (NVlabs)](https://github.com/NVlabs/LongLive)
- [Self-Forcing (guandeh17)](https://github.com/guandeh17/Self-Forcing)
## References
```
@article{bian2026echoinfinity,
title={Echo-Infinity: Learnable Evolving Memory for Real-Time Infinite Video Generation},
author={Bian, Yuxuan and Xue, Zeyue and Zhang, Songchun and Zhang, Shiyi and Jin, Weiyang and Li, Yaowei and Zhuang, Junhao and Li, Haoran and Huang, Jie and Huang, Haoyang and Duan, Nan and Xu, Qiang},
journal={arXiv preprint arXiv:2606.04527},
year={2026}
}
```
|