BWGZK
/

EndlessWorld

+---
+license: apache-2.0
+library_name: pytorch
+pipeline_tag: text-to-video
+tags:
+  - text-to-video
+  - video-generation
+  - streaming
+  - self-forcing
+  - wan2.1
+  - 3d-aware
+base_model: Wan-AI/Wan2.1-T2V-1.3B
+---
+# EndlessWorld — Real-Time 3D-Aware Long Video Generation
+Checkpoint for **EndlessWorld**, a streaming video diffusion model that produces
+*unbounded-length*, 3D-consistent videos in real time on a single GPU.
+- **Paper:** [arXiv:2512.12430](https://arxiv.org/abs/2512.12430)
+- **Code:** [github.com/BWGZK-keke/EndlessWorld](https://github.com/BWGZK-keke/EndlessWorld)
+- **Base model:** [Wan-AI/Wan2.1-T2V-1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B)
+- **3D encoder:** [lhjiang/anysplat](https://huggingface.co/lhjiang/anysplat)
+## What's in this repo
+| File       | Description                                                             |
+|------------|-------------------------------------------------------------------------|
+| `model.pt` | DMD-distilled generator weights for the EndlessWorld causal Wan model (step 1000 of the `self_forcing_dmd_separate` SOTA run). |
+This is the generator checkpoint only. To run inference you also need:
+1. The Wan2.1-T2V-1.3B base weights (text encoder, VAE, etc.)
+2. The AnySplat 3D Gaussian feature encoder
+See the [GitHub README](https://github.com/BWGZK-keke/EndlessWorld#installation)
+for the full setup.
+## Method
+EndlessWorld extends the **Self-Forcing** causal diffusion framework (Wan2.1
+T2V-1.3B backbone) with a **Global 3D-Aware Attention** module that injects
+scene geometry — extracted on the fly by AnySplat — into the conditional
+embedding of every autoregressive chunk.
+```
+                ┌──────────────────┐
+   prompt  ─►   │   text encoder   │
+                └────────┬─────────┘
+                         │ original_embed
+                         ▼
+   3D feature ──► [ CrossAttentionFusion ] ──► prompt_embeds ──► causal Wan generator
+        ▲                                                              │
+        │                                                              ▼
+        └─────── AnySplat( decoded RGB chunk ) ◄─── VAE.decode (latents)
+                                                  │
+                       autoregressive loop  ◄─────┘
+```
+Three ingredients:
+- **Conditional autoregressive (self-forcing) training** — frames are denoised
+  block-by-block with KV-cache, conditioning each new block on previously
+  generated content.
+- **Global 3D-Aware Attention** — `CrossAttentionFusion` + `To3D` modules ingest
+  3D Gaussian features produced by AnySplat and fuse them with the text
+  embedding, giving the generator a persistent geometric memory of the world
+  rendered so far.
+- **Real-time streaming inference** — the rollout loop re-extracts 3D features
+  from the most recently decoded chunk and feeds the fused embedding back into
+  the causal generator, enabling indefinite extension on a single GPU.
+## Quick start
+```bash
+git clone https://github.com/BWGZK-keke/EndlessWorld
+cd EndlessWorld
+pip install -r requirements.txt
+# Download this checkpoint
+huggingface-cli download BWGZK/EndlessWorld model.pt --local-dir checkpoints/
+# Update configs/self_forcing_dmd.yaml -> generator_ckpt: checkpoints/model.pt
+bash test.sh
+```
+Loading directly from Python:
+```python
+import torch
+from huggingface_hub import hf_hub_download
+ckpt = hf_hub_download(repo_id="BWGZK/EndlessWorld", filename="model.pt")
+state_dict = torch.load(ckpt, map_location="cpu")
+```
+## Training
+- **Run:** `self_forcing_dmd_separate` (DMD distillation, separate fake-score
+  network), step **1000**.
+- **Framework:** Multi-GPU FSDP via the [`train.py`](https://github.com/BWGZK-keke/EndlessWorld/blob/main/train.py)
+  entry point with [`configs/self_forcing_dmd.yaml`](https://github.com/BWGZK-keke/EndlessWorld/blob/main/configs/self_forcing_dmd.yaml).
+## Citation
+```bibtex
+@article{zhang2025endlessworld,
+  title   = {Endless World: Real-Time 3D-Aware Long Video Generation},
+  author  = {Zhang, Ke and others},
+  journal = {arXiv preprint arXiv:2512.12430},
+  year    = {2025}
+}
+```
+## License
+Apache 2.0 — same as the upstream Wan2.1 and Self-Forcing projects.