| --- |
| pipeline_tag: text-to-video |
| --- |
| |
| <div align="center"> |
|
|
| # Echo-Infinity |
| ### Learnable Evolving Memory for Real-Time Infinite Video Generation |
|
|
| <h3 align="center"><a href="https://arxiv.org/abs/2606.04527">Paper</a> | <a href="https://echo-team-joy-future-academy-jd.github.io/Echo-Infinity/">Website</a> | <a href="https://huggingface.co/Echo-Team/Echo-Infinity">Models</a> | <a href="https://github.com/Echo-Team-Joy-Future-Academy-JD/Echo-Infinity">Code</a></h3> |
|
|
| </div> |
|
|
| ----- |
|
|
| Echo-Infinity demonstrates hour-scale and real-time video generation with a learnable memory to filter, abstract, and compress any-length history at constant cost, suggesting a practical path toward infinite video generation. |
|
|
| ----- |
|
|
| <table align="center"> |
| <tr> |
| <td align="center" width="50%"> |
| <a href="https://www.youtube.com/watch?v=YR7G_yJs8WM"> |
| <img src="https://img.youtube.com/vi/YR7G_yJs8WM/hqdefault.jpg" alt="24h Demo β Part 1 / 2" width="100%"/> |
| <br/> |
| <sub><b>24h Demo β Part 1 / 2</b></sub> |
| </a> |
| </td> |
| <td align="center" width="50%"> |
| <a href="https://www.youtube.com/watch?v=kF2Nksvijb8"> |
| <img src="https://img.youtube.com/vi/kF2Nksvijb8/hqdefault.jpg" alt="24h Demo β Part 2 / 2" width="100%"/> |
| <br/> |
| <sub><b>24h Demo β Part 2 / 2</b></sub> |
| </a> |
| </td> |
| </tr> |
| </table> |
| |
| <p align="center"> |
| <sub><i>Note: Each 24-hour demo is too large to host inline, so it is only viewable via YouTube. Each clip is split into two consecutive 12-hour parts due to YouTube's per-video duration limit, and visual quality has been moderately compressed for upload bandwidth efficiency.</i></sub> |
| </p> |
|
|
|
|
| ## π₯ News |
| - **2026.6.03**: The [paper](https://arxiv.org/abs/2606.04527), [project page](https://echo-team-joy-future-academy-jd.github.io/Echo-Infinity/), [model](https://huggingface.co/Echo-Team/Echo-Infinity), and [code](https://github.com/Echo-Team-Joy-Future-Academy-JD/Echo-Infinity) are released. |
|
|
|
|
| ## Quick Start |
|
|
| ### Installation |
|
|
| ```bash |
| conda create -n echo_infinity python=3.10 -y |
| conda activate echo_infinity |
| |
| cd Echo-Infinity |
| pip install -r requirements.txt |
| pip install flash-attn --no-build-isolation |
| python setup.py develop |
| ``` |
|
|
| ### Download Checkpoints |
|
|
| ```bash |
| # Wan2.1 base models (teacher / student backbones) |
| hf download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B |
| hf download Wan-AI/Wan2.1-T2V-14B --local-dir wan_models/Wan2.1-T2V-14B |
| |
| # Stage-2 (Causal ODE) init from upstream Causal-Forcing |
| hf download zhuhz22/Causal-Forcing chunkwise/causal_forcing.pt --local-dir checkpoints |
| |
| # Echo-Infinity Stage-1 (init) and Stage-2 (long) checkpoints |
| hf download Echo-Team/Echo-Infinity echo_infinity.pt --local-dir checkpoints |
| hf download Echo-Team/Echo-Infinity echo_infinity-long.pt --local-dir checkpoints |
| ``` |
|
|
| ### CLI Inference |
|
|
| All commands assume cwd = `Echo-Infinity/`. |
|
|
| **5s β short video** (single-prompt, EMA on): |
| ```bash |
| CUDA_VISIBLE_DEVICES=0 python inference/inference.py \ |
| --config_path configs/echo_infinity_inference_std.yaml \ |
| --use_ema \ |
| --output_folder output/5s \ |
| --seed 0 |
| ``` |
|
|
| **30s β mid-length video** (single-prompt): |
| ```bash |
| CUDA_VISIBLE_DEVICES=0 python inference/inference.py \ |
| --config_path configs/echo_infinity-long_inference.yaml \ |
| --output_folder output/30s \ |
| --seed 0 |
| ``` |
|
|
| **240s β long video** (single-prompt): |
| ```bash |
| CUDA_VISIBLE_DEVICES=0 python inference/inference.py \ |
| --config_path configs/echo_infinity-long_inference_240s.yaml \ |
| --output_folder output/240s \ |
| --seed 0 |
| ``` |
|
|
| **60s interactive** (multi-prompt switching within one video): |
| ```bash |
| CUDA_VISIBLE_DEVICES=0 python inference/interactive_inference.py \ |
| --config_path configs/echo_infinity-long_interactive.yaml \ |
| --output_folder output/60s_interactive \ |
| --seed 1 |
| ``` |
|
|
| **1h β hour-level video** (streaming decode): |
| ```bash |
| bash inference/stream_long/run_1h.sh |
| ``` |
|
|
| **24h β full-day video** (streaming decode): |
| ```bash |
| bash inference/stream_long/run_24h.sh |
| ``` |
|
|
| Prompts are under `inference/prompts/demo_*.txt` and switch-prompts at `inference/prompts/demo_60s_interactive.jsonl`. Override with `--data_path your_prompts.txt`. |
|
|
|
|
| ## Training |
|
|
| The pipeline has two stages of DMD training. Both are launched on 4 nodes Γ 8 GPUs by default (`gradient_accumulation_steps=2`, effective batch size 64). Override the launch topology via `MASTER_ADDR`, `NODE_IP_*`, `NNODES`, and `NPROC_PER_NODE` environment variables (e.g. `NNODES=1 NPROC_PER_NODE=8 bash scripts/train_echo_infinity_init.sh` for single-node training). |
|
|
| Weights & Biases logging is **off by default**. To enable it, set `USE_WANDB=1` and fill in `wandb_key` / `wandb_entity` in the corresponding config (`configs/echo_infinity.yaml`, `configs/echo_infinity-long.yaml`). |
|
|
| ### Stage 1 β Init |
|
|
| ```bash |
| bash scripts/train_echo_infinity_init.sh |
| ``` |
|
|
| Output: `logs/echo_infinity/checkpoint_model_000400/model.pt`. To reuse it as the Stage-2 init or for inference, copy it to `checkpoints/echo_infinity.pt` (the path the configs expect) or pass `--checkpoint_path`. |
|
|
| ### Stage 2 β Long-Video Tuning |
|
|
| ```bash |
| bash scripts/train_echo_infinity_long.sh |
| ``` |
|
|
| Output: `logs/echo_infinity-long/checkpoint_model_003200/model.pt`. To reuse it for the long-form inference above, copy it to `checkpoints/echo_infinity-long.pt` (the `lora_ckpt` path the configs expect) or pass `--lora_ckpt`. |
|
|
| Training data (download from the same HF repo as the model weights): |
|
|
| ```bash |
| hf download Echo-Team/Echo-Infinity vidprom_filtered_extended.txt --local-dir prompts |
| hf download Echo-Team/Echo-Infinity vidprom_filtered_extended_switch.txt --local-dir prompts |
| ``` |
|
|
| - `prompts/vidprom_filtered_extended.txt` β base prompts for streaming training |
| - `prompts/vidprom_filtered_extended_switch.txt` β prompt-switch pairs for interactive training |
|
|
|
|
| ## Acknowledgements |
|
|
| This codebase builds on the open-source implementations of: |
| - [Wan2.1 (Wan-Video)](https://github.com/Wan-Video/Wan2.1) |
| - [Causal-Forcing (thu-ml)](https://github.com/thu-ml/Causal-Forcing) |
| - [LongLive (NVlabs)](https://github.com/NVlabs/LongLive) |
| - [Self-Forcing (guandeh17)](https://github.com/guandeh17/Self-Forcing) |
|
|
|
|
| ## References |
|
|
| ``` |
| @article{bian2026echoinfinity, |
| title={Echo-Infinity: Learnable Evolving Memory for Real-Time Infinite Video Generation}, |
| author={Bian, Yuxuan and Xue, Zeyue and Zhang, Songchun and Zhang, Shiyi and Jin, Weiyang and Li, Yaowei and Zhuang, Junhao and Li, Haoran and Huang, Jie and Huang, Haoyang and Duan, Nan and Xu, Qiang}, |
| journal={arXiv preprint arXiv:2606.04527}, |
| year={2026} |
| } |
| ``` |