Instructions to use BWGZK/EndlessWorld with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- SelfForcing
How to use BWGZK/EndlessWorld with SelfForcing:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
metadata
license: apache-2.0
library_name: pytorch
pipeline_tag: text-to-video
tags:
- text-to-video
- video-generation
- streaming
- self-forcing
- wan2.1
- 3d-aware
base_model: Wan-AI/Wan2.1-T2V-1.3B
EndlessWorld — Real-Time 3D-Aware Long Video Generation
Checkpoint for EndlessWorld, a streaming video diffusion model that produces unbounded-length, 3D-consistent videos in real time on a single GPU.
- Paper: arXiv:2512.12430
- Code: github.com/BWGZK-keke/EndlessWorld
- Base model: Wan-AI/Wan2.1-T2V-1.3B
- 3D encoder: lhjiang/anysplat
What's in this repo
| File | Description |
|---|---|
model.pt |
DMD-distilled generator weights for the EndlessWorld causal Wan model (step 1000 of the self_forcing_dmd_separate SOTA run). |
This is the generator checkpoint only. To run inference you also need:
- The Wan2.1-T2V-1.3B base weights (text encoder, VAE, etc.)
- The AnySplat 3D Gaussian feature encoder
See the GitHub README for the full setup.
Method
EndlessWorld extends the Self-Forcing causal diffusion framework (Wan2.1 T2V-1.3B backbone) with a Global 3D-Aware Attention module that injects scene geometry — extracted on the fly by AnySplat — into the conditional embedding of every autoregressive chunk.
Three ingredients:
- Conditional autoregressive (self-forcing) training — frames are denoised block-by-block with KV-cache, conditioning each new block on previously generated content.
- Global 3D-Aware Attention —
CrossAttentionFusion+To3Dmodules ingest 3D Gaussian features produced by AnySplat and fuse them with the text embedding, giving the generator a persistent geometric memory of the world rendered so far. - Real-time streaming inference — the rollout loop re-extracts 3D features from the most recently decoded chunk and feeds the fused embedding back into the causal generator, enabling indefinite extension on a single GPU.
Quick start
git clone https://github.com/BWGZK-keke/EndlessWorld
cd EndlessWorld
pip install -r requirements.txt
# Download this checkpoint
huggingface-cli download BWGZK/EndlessWorld model.pt --local-dir checkpoints/
# Update configs/self_forcing_dmd.yaml -> generator_ckpt: checkpoints/model.pt
bash test.sh
Loading directly from Python:
import torch
from huggingface_hub import hf_hub_download
ckpt = hf_hub_download(repo_id="BWGZK/EndlessWorld", filename="model.pt")
state_dict = torch.load(ckpt, map_location="cpu")
Training
- Framework: Multi-GPU FSDP via the
train.pyentry point withconfigs/self_forcing_dmd.yaml.
Citation
@article{zhang2025endlessworld,
title = {Endless World: Real-Time 3D-Aware Long Video Generation},
author = {Zhang, Ke and others},
journal = {arXiv preprint arXiv:2512.12430},
year = {2025}
}
License
Apache 2.0 — same as the upstream Wan2.1 and Self-Forcing projects.
