Update paper link and citation to arXiv:2606.04527

82a4274 verified 1 day ago

6.54 kB

	<div align="center">

	# Echo-Infinity
	### Learnable Evolving Memory for Real-Time Infinite Video Generation

	<h3 align="center"><a href="https://arxiv.org/abs/2606.04527">Paper</a> \| <a href="https://echo-team-joy-future-academy-jd.github.io/Echo-Infinity/">Website</a> \| <a href="https://huggingface.co/Echo-Team/Echo-Infinity">Models</a> \| <a href="https://github.com/Echo-Team-Joy-Future-Academy-JD/Echo-Infinity">Code</a></h3>

	</div>

	-----

	Echo-Infinity demonstrates hour-scale and real-time video generation with a learnable memory to filter, abstract, and compress any-length history at constant cost, suggesting a practical path toward infinite video generation.

	-----

	<table align="center">
	<tr>
	<td align="center" width="50%">
	<a href="https://www.youtube.com/watch?v=YR7G_yJs8WM">
	<img src="https://img.youtube.com/vi/YR7G_yJs8WM/hqdefault.jpg" alt="24h Demo — Part 1 / 2" width="100%"/>
	<br/>
	<sub><b>24h Demo — Part 1 / 2</b></sub>
	</a>
	</td>
	<td align="center" width="50%">
	<a href="https://www.youtube.com/watch?v=kF2Nksvijb8">
	<img src="https://img.youtube.com/vi/kF2Nksvijb8/hqdefault.jpg" alt="24h Demo — Part 2 / 2" width="100%"/>
	<br/>
	<sub><b>24h Demo — Part 2 / 2</b></sub>
	</a>
	</td>
	</tr>
	</table>

	<p align="center">
	<sub><i>Note: Each 24-hour demo is too large to host inline, so it is only viewable via YouTube. Each clip is split into two consecutive 12-hour parts due to YouTube's per-video duration limit, and visual quality has been moderately compressed for upload bandwidth efficiency.</i></sub>
	</p>


	## 🔥 News
	- 2026.6.03: The [paper](https://arxiv.org/abs/2606.04527), [project page](https://echo-team-joy-future-academy-jd.github.io/Echo-Infinity/), [model](https://huggingface.co/Echo-Team/Echo-Infinity), and [code](https://github.com/Echo-Team-Joy-Future-Academy-JD/Echo-Infinity) are released.


	## Quick Start

	### Installation

	```bash
	conda create -n echo_infinity python=3.10 -y
	conda activate echo_infinity

	cd Echo-Infinity
	pip install -r requirements.txt
	pip install flash-attn --no-build-isolation
	python setup.py develop
	```

	### Download Checkpoints

	```bash
	# Wan2.1 base models (teacher / student backbones)
	hf download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B
	hf download Wan-AI/Wan2.1-T2V-14B --local-dir wan_models/Wan2.1-T2V-14B

	# Stage-2 (Causal ODE) init from upstream Causal-Forcing
	hf download zhuhz22/Causal-Forcing chunkwise/causal_forcing.pt --local-dir checkpoints

	# Echo-Infinity Stage-1 (init) and Stage-2 (long) checkpoints
	hf download Echo-Team/Echo-Infinity echo_infinity.pt --local-dir checkpoints
	hf download Echo-Team/Echo-Infinity echo_infinity-long.pt --local-dir checkpoints
	```

	### CLI Inference

	All commands assume cwd = `Echo-Infinity/`.

	5s — short video (single-prompt, EMA on):
	```bash
	CUDA_VISIBLE_DEVICES=0 python inference/inference.py \
	--config_path configs/echo_infinity_inference_std.yaml \
	--use_ema \
	--output_folder output/5s \
	--seed 0
	```

	30s — mid-length video (single-prompt):
	```bash
	CUDA_VISIBLE_DEVICES=0 python inference/inference.py \
	--config_path configs/echo_infinity-long_inference.yaml \
	--output_folder output/30s \
	--seed 0
	```

	240s — long video (single-prompt):
	```bash
	CUDA_VISIBLE_DEVICES=0 python inference/inference.py \
	--config_path configs/echo_infinity-long_inference_240s.yaml \
	--output_folder output/240s \
	--seed 0
	```

	60s interactive (multi-prompt switching within one video):
	```bash
	CUDA_VISIBLE_DEVICES=0 python inference/interactive_inference.py \
	--config_path configs/echo_infinity-long_interactive.yaml \
	--output_folder output/60s_interactive \
	--seed 1
	```

	1h — hour-level video (streaming decode):
	```bash
	bash inference/stream_long/run_1h.sh
	```

	24h — full-day video (streaming decode):
	```bash
	bash inference/stream_long/run_24h.sh
	```

	Prompts are under `inference/prompts/demo_*.txt` and switch-prompts at `inference/prompts/demo_60s_interactive.jsonl`. Override with `--data_path your_prompts.txt`.


	## Training

	The pipeline has two stages of DMD training. Both are launched on 4 nodes × 8 GPUs by default (`gradient_accumulation_steps=2`, effective batch size 64). Override the launch topology via `MASTER_ADDR`, `NODE_IP_*`, `NNODES`, and `NPROC_PER_NODE` environment variables (e.g. `NNODES=1 NPROC_PER_NODE=8 bash scripts/train_echo_infinity_init.sh` for single-node training).

	Weights & Biases logging is off by default. To enable it, set `USE_WANDB=1` and fill in `wandb_key` / `wandb_entity` in the corresponding config (`configs/echo_infinity.yaml`, `configs/echo_infinity-long.yaml`).

	### Stage 1 — Init

	```bash
	bash scripts/train_echo_infinity_init.sh
	```

	Output: `logs/echo_infinity/checkpoint_model_000400/model.pt`. To reuse it as the Stage-2 init or for inference, copy it to `checkpoints/echo_infinity.pt` (the path the configs expect) or pass `--checkpoint_path`.

	### Stage 2 — Long-Video Tuning

	```bash
	bash scripts/train_echo_infinity_long.sh
	```

	Output: `logs/echo_infinity-long/checkpoint_model_003200/model.pt`. To reuse it for the long-form inference above, copy it to `checkpoints/echo_infinity-long.pt` (the `lora_ckpt` path the configs expect) or pass `--lora_ckpt`.

	Training data (download from the same HF repo as the model weights):

	```bash
	hf download Echo-Team/Echo-Infinity vidprom_filtered_extended.txt --local-dir prompts
	hf download Echo-Team/Echo-Infinity vidprom_filtered_extended_switch.txt --local-dir prompts
	```

	- `prompts/vidprom_filtered_extended.txt` — base prompts for streaming training
	- `prompts/vidprom_filtered_extended_switch.txt` — prompt-switch pairs for interactive training


	## Acknowledgements

	This codebase builds on the open-source implementations of:
	- [Wan2.1 (Wan-Video)](https://github.com/Wan-Video/Wan2.1)
	- [Causal-Forcing (thu-ml)](https://github.com/thu-ml/Causal-Forcing)
	- [LongLive (NVlabs)](https://github.com/NVlabs/LongLive)
	- [Self-Forcing (guandeh17)](https://github.com/guandeh17/Self-Forcing)


	## References

	```
	@article{bian2026echoinfinity,
	title={Echo-Infinity: Learnable Evolving Memory for Real-Time Infinite Video Generation},
	author={Bian, Yuxuan and Xue, Zeyue and Zhang, Songchun and Zhang, Shiyi and Jin, Weiyang and Li, Yaowei and Zhuang, Junhao and Li, Haoran and Huang, Jie and Huang, Haoyang and Duan, Nan and Xu, Qiang},
	journal={arXiv preprint arXiv:2606.04527},
	year={2026}
	}
	```