Update README.md

6ccbef2 verified 6 days ago

3.53 kB

license: apache-2.0
library_name: pytorch
pipeline_tag: text-to-video
tags:
  - text-to-video
  - video-generation
  - streaming
  - self-forcing
  - wan2.1
  - 3d-aware
base_model: Wan-AI/Wan2.1-T2V-1.3B

EndlessWorld — Real-Time 3D-Aware Long Video Generation

Checkpoint for EndlessWorld, a streaming video diffusion model that produces unbounded-length, 3D-consistent videos in real time on a single GPU.

Paper: arXiv:2512.12430
Code: github.com/BWGZK-keke/EndlessWorld
Base model: Wan-AI/Wan2.1-T2V-1.3B
3D encoder: lhjiang/anysplat

What's in this repo

File	Description
`model.pt`	DMD-distilled generator weights for the EndlessWorld causal Wan model (step 1000 of the `self_forcing_dmd_separate` SOTA run).

This is the generator checkpoint only. To run inference you also need:

The Wan2.1-T2V-1.3B base weights (text encoder, VAE, etc.)
The AnySplat 3D Gaussian feature encoder

See the GitHub README for the full setup.

Method

EndlessWorld extends the Self-Forcing causal diffusion framework (Wan2.1 T2V-1.3B backbone) with a Global 3D-Aware Attention module that injects scene geometry — extracted on the fly by AnySplat — into the conditional embedding of every autoregressive chunk.

Three ingredients:

Conditional autoregressive (self-forcing) training — frames are denoised block-by-block with KV-cache, conditioning each new block on previously generated content.
Global 3D-Aware Attention — CrossAttentionFusion + To3D modules ingest 3D Gaussian features produced by AnySplat and fuse them with the text embedding, giving the generator a persistent geometric memory of the world rendered so far.
Real-time streaming inference — the rollout loop re-extracts 3D features from the most recently decoded chunk and feeds the fused embedding back into the causal generator, enabling indefinite extension on a single GPU.

Quick start

git clone https://github.com/BWGZK-keke/EndlessWorld
cd EndlessWorld
pip install -r requirements.txt

# Download this checkpoint
huggingface-cli download BWGZK/EndlessWorld model.pt --local-dir checkpoints/

# Update configs/self_forcing_dmd.yaml -> generator_ckpt: checkpoints/model.pt
bash test.sh

Loading directly from Python:

import torch
from huggingface_hub import hf_hub_download

ckpt = hf_hub_download(repo_id="BWGZK/EndlessWorld", filename="model.pt")
state_dict = torch.load(ckpt, map_location="cpu")

Training

Framework: Multi-GPU FSDP via the train.py entry point with configs/self_forcing_dmd.yaml.

Citation

@article{zhang2025endlessworld,
  title   = {Endless World: Real-Time 3D-Aware Long Video Generation},
  author  = {Zhang, Ke and others},
  journal = {arXiv preprint arXiv:2512.12430},
  year    = {2025}
}

License

Apache 2.0 — same as the upstream Wan2.1 and Self-Forcing projects.