lapp0 commited on
Commit
b441e6b
·
verified ·
1 Parent(s): c1b6f9a

Update README.md

Browse files

Remove incompatible diffusers references

Files changed (1) hide show
  1. README.md +0 -87
README.md CHANGED
@@ -25,93 +25,6 @@ In order to simply use Waypoint-1-Small, we recommend [Biome](https://github.com
25
 
26
  To run the model locally, we recommend an NVIDIA RTX 5090, which should achieve 20-30 FPS, or an RTX 6000 Pro Blackwell, which should achieve ~35 FPS.
27
 
28
- # Run with Diffusers Modular Pipelines
29
-
30
- World Engine and Waypoint-1 can be used with [Diffusers Modular Pipelines](https://huggingface.co/docs/diffusers/main/en/api/modular_pipelines/modular_pipeline).
31
-
32
- ## Setup
33
-
34
- ```bash
35
- uv venv -p 3.11 && uv pip install \
36
- torch>=2.9.0 \
37
- diffusers>=0.36.0 \
38
- transformers>=4.57.1 \
39
- einops>=0.8.0 \
40
- tensordict>=0.5.0 \
41
- regex \
42
- ftfy \
43
- imageio \
44
- imageio-ffmpeg \
45
- tqdm
46
- ```
47
-
48
- ## Usage Example
49
-
50
- ```python
51
- import random
52
- import torch
53
-
54
- from tqdm import tqdm
55
- from dataclasses import dataclass, field
56
- from typing import Set, Tuple
57
- from diffusers.modular_pipelines import ModularPipeline
58
- from diffusers.utils import load_image, export_to_video
59
-
60
- @dataclass
61
- class CtrlInput:
62
- button: Set[int] = field(default_factory=set) # pressed button IDs
63
- mouse: Tuple[float, float] = (0.0, 0.0) # (x, y) velocity
64
-
65
-
66
- # Generate random control trajectories
67
- ctrl = lambda: random.choice(
68
- [
69
- CtrlInput(button={48, 42}, mouse=[0.4, 0.3]),
70
- CtrlInput(mouse=[0.1, 0.2]),
71
- CtrlInput(button={95, 32, 105}),
72
- ]
73
- )
74
- model_id = "Overworld/Waypoint-1-Small"
75
-
76
- pipe = ModularPipeline.from_pretrained(model_id, trust_remote_code=True)
77
- pipe.load_components(
78
- device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True
79
- )
80
- pipe.transformer.apply_inference_patches()
81
-
82
- # Optional Quantization Step
83
- # Available options are: nvfp4 (if running on Blackwell hardware), fp8, w8a8
84
- # pipe.transformer.quantize("nvfp4")
85
- pipe.transformer.compile(fullgraph=True, mode="max-autotune", dynamic=False)
86
- pipe.vae.bake_weight_norm()
87
- pipe.vae.compile(fullgraph=True, mode="max-autotune")
88
-
89
- prompt = "A fun game"
90
- image = load_image(
91
- "https://gist.github.com/user-attachments/assets/4adc5a3d-6980-4d1e-b6e8-9033cdf61c66"
92
- )
93
-
94
- num_frames = 240
95
- outputs = []
96
-
97
- # create world state based on an initial image
98
- state = pipe(prompt=prompt, image=image, button=ctrl().button, mouse=ctrl().mouse)
99
- outputs.append(state.values["images"])
100
-
101
- state.values["image"] = None
102
- for _ in tqdm(range(1, num_frames)):
103
- state = pipe(
104
- state,
105
- prompt=prompt,
106
- button=ctrl().button,
107
- mouse=ctrl().mouse,
108
- output_type="pil",
109
- )
110
- outputs.append(state.values["images"])
111
-
112
- export_to_video(outputs, "waypoint-1-small.mp4", fps=60)
113
- ```
114
-
115
  # Keywords
116
 
117
  To properly explain limitations and misuse we must define some terms. While the model can be used for general interactive video generation tasks, we herein define interacting with the model via sending controls and receiving new frames as “playing” the model, and the agent/user inputting controls as the “player”. The model has two forms of output, continuations and generations. Continuations occur when seed frames are given and no inputs are given. For example, if a scene has fire or water, you may see them evolve progressively in the generated frames even if no action is given. Likewise, if you seed with an image of a humanoid entity, the entity will persist on the screen as you move/look around. However, generations occur when the player plays with the model extensively, for example moving around, turning around fully, or interacting with objects/items. Continuations roughly correspond to moving around already existing information in the given context frames while generations correspond to creating entirely new information.
 
25
 
26
  To run the model locally, we recommend an NVIDIA RTX 5090, which should achieve 20-30 FPS, or an RTX 6000 Pro Blackwell, which should achieve ~35 FPS.
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  # Keywords
29
 
30
  To properly explain limitations and misuse we must define some terms. While the model can be used for general interactive video generation tasks, we herein define interacting with the model via sending controls and receiving new frames as “playing” the model, and the agent/user inputting controls as the “player”. The model has two forms of output, continuations and generations. Continuations occur when seed frames are given and no inputs are given. For example, if a scene has fire or water, you may see them evolve progressively in the generated frames even if no action is given. Likewise, if you seed with an image of a humanoid entity, the entity will persist on the screen as you move/look around. However, generations occur when the player plays with the model extensively, for example moving around, turning around fully, or interacting with objects/items. Continuations roughly correspond to moving around already existing information in the given context frames while generations correspond to creating entirely new information.