File size: 6,537 Bytes

<div align="center">

# Echo-Infinity
### Learnable Evolving Memory for Real-Time Infinite Video Generation

<h3 align="center"><a href="https://arxiv.org/abs/2606.04527">Paper</a> | <a href="https://echo-team-joy-future-academy-jd.github.io/Echo-Infinity/">Website</a> | <a href="https://huggingface.co/Echo-Team/Echo-Infinity">Models</a> | <a href="https://github.com/Echo-Team-Joy-Future-Academy-JD/Echo-Infinity">Code</a></h3>

</div>

-----

Echo-Infinity demonstrates hour-scale and real-time video generation with a learnable memory to filter, abstract, and compress any-length history at constant cost, suggesting a practical path toward infinite video generation.

-----

<table align="center">
<tr>
<td align="center" width="50%">
  <a href="https://www.youtube.com/watch?v=YR7G_yJs8WM">
    <img src="https://img.youtube.com/vi/YR7G_yJs8WM/hqdefault.jpg" alt="24h Demo — Part 1 / 2" width="100%"/>
    <br/>
    <sub><b>24h Demo — Part 1 / 2</b></sub>
  </a>
</td>
<td align="center" width="50%">
  <a href="https://www.youtube.com/watch?v=kF2Nksvijb8">
    <img src="https://img.youtube.com/vi/kF2Nksvijb8/hqdefault.jpg" alt="24h Demo — Part 2 / 2" width="100%"/>
    <br/>
    <sub><b>24h Demo — Part 2 / 2</b></sub>
  </a>
</td>
</tr>
</table>

<p align="center">
<sub><i>Note: Each 24-hour demo is too large to host inline, so it is only viewable via YouTube. Each clip is split into two consecutive 12-hour parts due to YouTube's per-video duration limit, and visual quality has been moderately compressed for upload bandwidth efficiency.</i></sub>
</p>


## 🔥 News
- **2026.6.03**: The [paper](https://arxiv.org/abs/2606.04527), [project page](https://echo-team-joy-future-academy-jd.github.io/Echo-Infinity/), [model](https://huggingface.co/Echo-Team/Echo-Infinity), and [code](https://github.com/Echo-Team-Joy-Future-Academy-JD/Echo-Infinity) are released.


## Quick Start

### Installation

```bash
conda create -n echo_infinity python=3.10 -y
conda activate echo_infinity

cd Echo-Infinity
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
python setup.py develop
```

### Download Checkpoints

```bash
# Wan2.1 base models (teacher / student backbones)
hf download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B
hf download Wan-AI/Wan2.1-T2V-14B  --local-dir wan_models/Wan2.1-T2V-14B

# Stage-2 (Causal ODE) init from upstream Causal-Forcing
hf download zhuhz22/Causal-Forcing chunkwise/causal_forcing.pt --local-dir checkpoints

# Echo-Infinity Stage-1 (init) and Stage-2 (long) checkpoints
hf download Echo-Team/Echo-Infinity echo_infinity.pt       --local-dir checkpoints
hf download Echo-Team/Echo-Infinity echo_infinity-long.pt  --local-dir checkpoints
```

### CLI Inference

All commands assume cwd = `Echo-Infinity/`.

**5s — short video** (single-prompt, EMA on):
```bash
CUDA_VISIBLE_DEVICES=0 python inference/inference.py \
    --config_path configs/echo_infinity_inference_std.yaml \
    --use_ema \
    --output_folder output/5s \
    --seed 0
```

**30s — mid-length video** (single-prompt):
```bash
CUDA_VISIBLE_DEVICES=0 python inference/inference.py \
    --config_path configs/echo_infinity-long_inference.yaml \
    --output_folder output/30s \
    --seed 0
```

**240s — long video** (single-prompt):
```bash
CUDA_VISIBLE_DEVICES=0 python inference/inference.py \
    --config_path configs/echo_infinity-long_inference_240s.yaml \
    --output_folder output/240s \
    --seed 0
```

**60s interactive** (multi-prompt switching within one video):
```bash
CUDA_VISIBLE_DEVICES=0 python inference/interactive_inference.py \
    --config_path configs/echo_infinity-long_interactive.yaml \
    --output_folder output/60s_interactive \
    --seed 1
```

**1h — hour-level video** (streaming decode):
```bash
bash inference/stream_long/run_1h.sh
```

**24h — full-day video** (streaming decode):
```bash
bash inference/stream_long/run_24h.sh
```

Prompts are under `inference/prompts/demo_*.txt` and switch-prompts at `inference/prompts/demo_60s_interactive.jsonl`. Override with `--data_path your_prompts.txt`.


## Training

The pipeline has two stages of DMD training. Both are launched on 4 nodes × 8 GPUs by default (`gradient_accumulation_steps=2`, effective batch size 64). Override the launch topology via `MASTER_ADDR`, `NODE_IP_*`, `NNODES`, and `NPROC_PER_NODE` environment variables (e.g. `NNODES=1 NPROC_PER_NODE=8 bash scripts/train_echo_infinity_init.sh` for single-node training).

Weights & Biases logging is **off by default**. To enable it, set `USE_WANDB=1` and fill in `wandb_key` / `wandb_entity` in the corresponding config (`configs/echo_infinity.yaml`, `configs/echo_infinity-long.yaml`).

### Stage 1 — Init

```bash
bash scripts/train_echo_infinity_init.sh
```

Output: `logs/echo_infinity/checkpoint_model_000400/model.pt`. To reuse it as the Stage-2 init or for inference, copy it to `checkpoints/echo_infinity.pt` (the path the configs expect) or pass `--checkpoint_path`.

### Stage 2 — Long-Video Tuning

```bash
bash scripts/train_echo_infinity_long.sh
```

Output: `logs/echo_infinity-long/checkpoint_model_003200/model.pt`. To reuse it for the long-form inference above, copy it to `checkpoints/echo_infinity-long.pt` (the `lora_ckpt` path the configs expect) or pass `--lora_ckpt`.

Training data (download from the same HF repo as the model weights):

```bash
hf download Echo-Team/Echo-Infinity vidprom_filtered_extended.txt        --local-dir prompts
hf download Echo-Team/Echo-Infinity vidprom_filtered_extended_switch.txt --local-dir prompts
```

- `prompts/vidprom_filtered_extended.txt`        — base prompts for streaming training
- `prompts/vidprom_filtered_extended_switch.txt` — prompt-switch pairs for interactive training


## Acknowledgements

This codebase builds on the open-source implementations of:
- [Wan2.1 (Wan-Video)](https://github.com/Wan-Video/Wan2.1)
- [Causal-Forcing (thu-ml)](https://github.com/thu-ml/Causal-Forcing)
- [LongLive (NVlabs)](https://github.com/NVlabs/LongLive)
- [Self-Forcing (guandeh17)](https://github.com/guandeh17/Self-Forcing)


## References

```
@article{bian2026echoinfinity,
  title={Echo-Infinity: Learnable Evolving Memory for Real-Time Infinite Video Generation},
  author={Bian, Yuxuan and Xue, Zeyue and Zhang, Songchun and Zhang, Shiyi and Jin, Weiyang and Li, Yaowei and Zhuang, Junhao and Li, Haoran and Huang, Jie and Huang, Haoyang and Duan, Nan and Xu, Qiang},
  journal={arXiv preprint arXiv:2606.04527},
  year={2026}
}
```