File size: 6,217 Bytes
cf6247c
 
c3fa4e8
 
 
 
 
 
 
 
 
 
 
770364c
c3fa4e8
 
 
8f1b6d8
c3fa4e8
 
 
 
 
8f1b6d8
c3fa4e8
 
8f1b6d8
c3fa4e8
 
 
6fb90b1
205c539
8f1b6d8
c3fa4e8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8f1b6d8
c3fa4e8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d39ff00
73ad515
c3fa4e8
 
 
 
 
 
8f1b6d8
c3fa4e8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
license: apache-2.0
library_name: world_engine
tags:
  - world-model
  - interactive-video
  - generative-worlds
  - real-time
  - consumer-gpu
  - diffusion
  - transformer
---

<video src="https://huggingface.co/Overworld/Waypoint-1.5-1B/resolve/main/assets/wp_1.5.mp4" controls autoplay loop muted playsinline width="100%"></video>

# Waypoint-1.5-1B

Waypoint-1.5-1B is the smallest dense model in Overworld’s Waypoint-1.5 family of real-time interactive video world models. Waypoint-1.5 is designed around local, real-time generation on consumer hardware ranging from the most advanced RTX 50 series cards, to older RTX 30 series cards.

## Model Details

- **Developed by:** Overworld
- **Model type:** Real-time interactive video world model
- **Model family:** [Waypoint-1.5](https://huggingface.co/collections/Overworld/waypoint-15)
- **Parameter count:** 1.2B
- **Context length / frame context:** 512 frames
- **Input modalities:** Starting image or video conditioning, keyboard / mouse inputs
- **Output:** Interactive generated video frames / world rollout
- **License:** Apache 2
- **Paper:** Coming soon
- **Streaming Demo:** [Overworld Stream](https://www.overworld.stream/)
- **Desktop Client:** [Biome](https://over.world/install)
- **Core Inference Library:** [Overworldai/world_engine](https://github.com/Wayfarer-Labs/world_engine)

## Model Summary

Waypoint-1.5 is Overworld’s next-generation real-time video world model release. It builds on the original Waypoint-1 release by improving visual fidelity, expanding the range of consumer hardware that can run the model, and pushing further toward responsive, interactive world simulation without datacenter-scale compute.

At the family level, Waypoint-1.5 targets real-time generation at up to **720p and 60 FPS**, and introduces **two model tiers**: a **720p** model for higher-performance systems and a [**360p** model](https://huggingface.co/Overworld/Waypoint-1.5-1B-360P) intended to run smoothly across a broader range of gaming PCs and Apple Silicon Macs. The release was also trained on **substantially more data than Waypoint-1**, improving coherence and motion consistency over longer interactions.

## What makes Waypoint-1.5 different

Waypoint-1.5 is built around a simple product constraint: generative worlds should be usable as **interactive systems**, not just watched as offline demos.

Compared with a conventional video generation workflow, the Waypoint family is designed for:

- **Real-time interaction** rather than offline batch generation
- **Low-latency responsiveness** to user inputs
- **Local execution** on consumer hardware
- **Persistent world rollouts** where coherence across time matters as much as single-frame fidelity

In practice, this means the model is intended to be used inside an interactive runtime that can condition generation on previous frames, and live control inputs.

## Intended Use

This model is intended for:

- Research on real-time world models and interactive video generation
- Prototyping AI-native game and simulation experiences
- Creative tools for interactive environments, world exploration, and live generative scenes
- Experimentation with low-latency generative systems on local hardware
- Education and research into control-conditioned video generation

## Out-of-Scope Use

This model is **not** intended for:

- Generating illegal content or content that exploits, sexualizes, or endangers minors
- Generating non-consensual sexual content or explicit sexual content where prohibited
- Impersonation, harassment, or deceptive identity-based content
- Generating copyrighted characters, branded IP, or celebrity likenesses in ways that infringe rights or violate platform rules
- Safety-critical decision-making, surveillance, or high-stakes automated systems
- Any deployment that removes reasonable safeguards while serving end users at scale

## Usage

This checkpoint is intended to be used with Overworld’s interactive runtime stack.

- Play on our official desktop client, [Biome](https://over.world/install)
- Use our [world_engine](https://github.com/Wayfarer-Labs/world_engine) inference library to build your own applications


### Recommended setup

- **Recommended GPU / device:** RTX 5090
- **Expected FPS on reference hardware:** 56 FPS
- **Supported GPUs:** Desktop RTX 30 Series and later. For weaker hardware, you may run [Overworld/Waypoint-1.5-1B-360P](https://huggingface.co/Overworld/Waypoint-1.5-1B-360P)


### Architecture

- **Backbone:** Autoregressive Diffusion Transformer
- **Autoencoder:** [Tiny Hunyuan Autoencoder (taehv1_5)](https://github.com/madebyollin/taehv) — 4x temporal compression, 8x spatial compression, 32 latent channels

### Training Data

Waypoint-1.5 was trained on **nearly 100× more data than Waypoint-1**, with the release emphasizing better coherence, motion consistency, and broader hardware accessibility.

## Performance

### Waypoint 1 vs Waypoint 1.5

| | Waypoint 1 | Waypoint 1.5 |
|---|---|---|
| Resolution | 360P | 720P |
| Context window | 2 seconds | 10 seconds |
| 4-step unquantized (5090) | 20 FPS | 56 FPS |
| 4-step w8a8 quantized (5090) | N/A | 72 FPS |
| 4-step w8a8 (3090) | N/A | 30 FPS |

![Generation Throughput — Waypoint 1 vs Waypoint 1.5](assets/perf_chart.png)

## Limitations

This model has important limitations.

- It is a generative world model, not a simulator with guaranteed physical accuracy.
- Long interactive rollouts may drift, collapse, or become inconsistent.
- The model may produce unstable geometry, object persistence failures, or implausible motion.
- Performance is hardware-dependent and may vary significantly by runtime stack and settings.
- Safety mitigations available in hosted deployments may not transfer fully to raw checkpoint use.
- Outputs may reflect biases, omissions, or unsafe patterns present in training data or learned world priors.

## Safety

Please see our blog post, ["Engineering Safety for Interactive World Models"](https://over.world/blog/engineering-safety-for-interactive-world-models) for details.

## Contact

- [Website](http://over.world/)
- [Discord](https://discord.gg/MEmQa7Wux4)
- [X/Twitter](https://x.com/overworld_ai)