BWGZK commited on
Commit
ceb2093
Β·
verified Β·
1 Parent(s): 2d589c8

Add model card linking to GitHub repo

Browse files
Files changed (1) hide show
  1. README.md +116 -0
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: pytorch
4
+ pipeline_tag: text-to-video
5
+ tags:
6
+ - text-to-video
7
+ - video-generation
8
+ - streaming
9
+ - self-forcing
10
+ - wan2.1
11
+ - 3d-aware
12
+ base_model: Wan-AI/Wan2.1-T2V-1.3B
13
+ ---
14
+
15
+ # EndlessWorld β€” Real-Time 3D-Aware Long Video Generation
16
+
17
+ Checkpoint for **EndlessWorld**, a streaming video diffusion model that produces
18
+ *unbounded-length*, 3D-consistent videos in real time on a single GPU.
19
+
20
+ - **Paper:** [arXiv:2512.12430](https://arxiv.org/abs/2512.12430)
21
+ - **Code:** [github.com/BWGZK-keke/EndlessWorld](https://github.com/BWGZK-keke/EndlessWorld)
22
+ - **Base model:** [Wan-AI/Wan2.1-T2V-1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B)
23
+ - **3D encoder:** [lhjiang/anysplat](https://huggingface.co/lhjiang/anysplat)
24
+
25
+ ## What's in this repo
26
+
27
+ | File | Description |
28
+ |------------|-------------------------------------------------------------------------|
29
+ | `model.pt` | DMD-distilled generator weights for the EndlessWorld causal Wan model (step 1000 of the `self_forcing_dmd_separate` SOTA run). |
30
+
31
+ This is the generator checkpoint only. To run inference you also need:
32
+ 1. The Wan2.1-T2V-1.3B base weights (text encoder, VAE, etc.)
33
+ 2. The AnySplat 3D Gaussian feature encoder
34
+
35
+ See the [GitHub README](https://github.com/BWGZK-keke/EndlessWorld#installation)
36
+ for the full setup.
37
+
38
+ ## Method
39
+
40
+ EndlessWorld extends the **Self-Forcing** causal diffusion framework (Wan2.1
41
+ T2V-1.3B backbone) with a **Global 3D-Aware Attention** module that injects
42
+ scene geometry β€” extracted on the fly by AnySplat β€” into the conditional
43
+ embedding of every autoregressive chunk.
44
+
45
+ ```
46
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
47
+ prompt ─► β”‚ text encoder β”‚
48
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
49
+ β”‚ original_embed
50
+ β–Ό
51
+ 3D feature ──► [ CrossAttentionFusion ] ──► prompt_embeds ──► causal Wan generator
52
+ β–² β”‚
53
+ β”‚ β–Ό
54
+ └─────── AnySplat( decoded RGB chunk ) ◄─── VAE.decode (latents)
55
+ β”‚
56
+ autoregressive loop β—„β”€β”€β”€β”€β”€β”˜
57
+ ```
58
+
59
+ Three ingredients:
60
+
61
+ - **Conditional autoregressive (self-forcing) training** β€” frames are denoised
62
+ block-by-block with KV-cache, conditioning each new block on previously
63
+ generated content.
64
+ - **Global 3D-Aware Attention** β€” `CrossAttentionFusion` + `To3D` modules ingest
65
+ 3D Gaussian features produced by AnySplat and fuse them with the text
66
+ embedding, giving the generator a persistent geometric memory of the world
67
+ rendered so far.
68
+ - **Real-time streaming inference** β€” the rollout loop re-extracts 3D features
69
+ from the most recently decoded chunk and feeds the fused embedding back into
70
+ the causal generator, enabling indefinite extension on a single GPU.
71
+
72
+ ## Quick start
73
+
74
+ ```bash
75
+ git clone https://github.com/BWGZK-keke/EndlessWorld
76
+ cd EndlessWorld
77
+ pip install -r requirements.txt
78
+
79
+ # Download this checkpoint
80
+ huggingface-cli download BWGZK/EndlessWorld model.pt --local-dir checkpoints/
81
+
82
+ # Update configs/self_forcing_dmd.yaml -> generator_ckpt: checkpoints/model.pt
83
+ bash test.sh
84
+ ```
85
+
86
+ Loading directly from Python:
87
+
88
+ ```python
89
+ import torch
90
+ from huggingface_hub import hf_hub_download
91
+
92
+ ckpt = hf_hub_download(repo_id="BWGZK/EndlessWorld", filename="model.pt")
93
+ state_dict = torch.load(ckpt, map_location="cpu")
94
+ ```
95
+
96
+ ## Training
97
+
98
+ - **Run:** `self_forcing_dmd_separate` (DMD distillation, separate fake-score
99
+ network), step **1000**.
100
+ - **Framework:** Multi-GPU FSDP via the [`train.py`](https://github.com/BWGZK-keke/EndlessWorld/blob/main/train.py)
101
+ entry point with [`configs/self_forcing_dmd.yaml`](https://github.com/BWGZK-keke/EndlessWorld/blob/main/configs/self_forcing_dmd.yaml).
102
+
103
+ ## Citation
104
+
105
+ ```bibtex
106
+ @article{zhang2025endlessworld,
107
+ title = {Endless World: Real-Time 3D-Aware Long Video Generation},
108
+ author = {Zhang, Ke and others},
109
+ journal = {arXiv preprint arXiv:2512.12430},
110
+ year = {2025}
111
+ }
112
+ ```
113
+
114
+ ## License
115
+
116
+ Apache 2.0 β€” same as the upstream Wan2.1 and Self-Forcing projects.