LouisCastricato commited on
Commit
6a54f89
·
verified ·
1 Parent(s): ef135cd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -3
README.md CHANGED
@@ -1,3 +1,49 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - WM
7
+ - Diffusion
8
+ - Egocentric
9
+ ---
10
+
11
+ Waypoint-1-Small is a 2.3 billion parameter control-and-text-conditioned causal diffusion model. It is a transformer architecture utilizing rectified flow, distilled via self forcing with DMD. The model can autoregressively generate new frames given historical frames, actions, and text.
12
+
13
+ # Capabilities:
14
+
15
+ Can generate worlds in realtime on high-end consumer hardware
16
+ Allows for exploration and interaction with worlds via control inputs
17
+ Allows for guidance of generated world via text prompts
18
+ Can be prompted with any number of starting frames and controls
19
+
20
+ # Usage:
21
+
22
+ In order to simply use Waypoint-1-Small, we recommend [Biome](https://github.com/Overworldai/Biome) for local or the Hugging Face hosted [Gradio Space](TODO).
23
+
24
+ To run the model locally, we recommend an NVIDIA RTX 5090, which should achieve 20-30 FPS, or an RTX 6000 Pro Blackwell, which should achieve ~35 FPS.
25
+
26
+ # Keywords
27
+
28
+ To properly explain limitations and misuse we must define some terms. While the model can be used for general interactive video generation tasks, we herein define interacting with the model via sending controls and receiving new frames as “playing” the model, and the agent/user inputting controls as the “player”. The model has two forms of output, continuations and generations. Continuations occur when seed frames are given and no inputs are given. For example, if a scene has fire or water, you may see them evolve progressively in the generated frames even if no action is given. Likewise, if you seed with an image of a humanoid entity, the entity will persist on the screen as you move/look around. However, generations occur when the player plays with the model extensively, for example moving around, turning around fully, or interacting with objects/items. Continuations roughly correspond to moving around already existing information in the given context frames while generations correspond to creating entirely new information.
29
+
30
+ # Limitations
31
+
32
+ - Continuations can plausibly model any inputted scene or photo, and will depend largely on the seed frame given. For generations, the model may occasionally:
33
+ - Ignore given text prompt
34
+ - Ignore certain controls in specific contexts
35
+ - Fail to generate realistic text or interactive HUD/UI elements
36
+ - Fail to generate human/animal entities
37
+ - Fail to generate realistic motion for given entities
38
+ - Prompt adherence is heavily dependent on prompting strategy
39
+ - Fail to generate faces
40
+
41
+ # Out of Scope Usage
42
+
43
+ - The model and derivatives must not be used
44
+ - For harassment or bullying
45
+ - For the purpose of exploiting or harming minors in any way
46
+ - For simulating extremely violent acts
47
+ - For generating violent/gory video
48
+ - For facilitation of large-scale disinformation campaigns
49
+ - For the purpose of generating any sexually explicit or suggestive material