Waypoint-1.5-1B-360P

Waypoint-1.5-1B-360P is the smallest dense model in Overworld’s Waypoint-1.5 family of real-time interactive video world models, with finetuning for 360P generation applied. Waypoint-1.5-1B-360P is designed around local, real-time generation on Nvidia laptop GPUs, and (soon) Apple Silicon.

Model Details

Developed by: Overworld
Model type: Real-time interactive video world model
Model family: Waypoint-1.5
Parameter count: 1.2B
Context length / frame context: 512 frames
Input modalities: Starting image or video conditioning, keyboard / mouse inputs
Output: Interactive generated video frames / world rollout
License: Apache 2
Paper: Coming soon
Streaming Demo: Overworld Stream
Desktop Client: Biome
Core Inference Library: Overworldai/world_engine

Model Summary

Waypoint-1.5 is Overworld’s next-generation real-time video world model release. It builds on the original Waypoint-1 release by improving visual fidelity, expanding the range of consumer hardware that can run the model, and pushing further toward responsive, interactive world simulation without datacenter-scale compute.

At the family level, Waypoint-1.5 targets real-time generation at up to 720p and 60 FPS, and introduces two model tiers: a 720p model for desktop RTX 30 series through RTX 50 series cards, and this 360P model for laptop GPUs and (soon) Apple Silicon. The release was also trained on substantially more data than Waypoint-1, improving coherence and motion consistency over longer interactions.

What makes Waypoint-1.5 different

Waypoint-1.5 is built around a simple product constraint: generative worlds should be usable as interactive systems, not just watched as offline demos.

Compared with a conventional video generation workflow, the Waypoint family is designed for:

Real-time interaction rather than offline batch generation
Low-latency responsiveness to user inputs
Local execution on consumer hardware
Persistent world rollouts where coherence across time matters as much as single-frame fidelity

In practice, this means the model is intended to be used inside an interactive runtime that can condition generation on previous frames, and live control inputs.

Intended Use

This model is intended for:

Research on real-time world models and interactive video generation
Prototyping AI-native game and simulation experiences
Creative tools for interactive environments, world exploration, and live generative scenes
Experimentation with low-latency generative systems on local hardware
Education and research into control-conditioned video generation

Out-of-Scope Use

This model is not intended for:

Generating illegal content or content that exploits, sexualizes, or endangers minors
Generating non-consensual sexual content or explicit sexual content where prohibited
Impersonation, harassment, or deceptive identity-based content
Generating copyrighted characters, branded IP, or celebrity likenesses in ways that infringe rights or violate platform rules
Safety-critical decision-making, surveillance, or high-stakes automated systems
Any deployment that removes reasonable safeguards while serving end users at scale

Usage

This checkpoint is intended to be used with Overworld’s interactive runtime stack.

Play on our official desktop client, Biome
Use our world_engine inference library to build your own applications

Architecture

Backbone: Autoregressive Diffusion Transformer
Autoencoder: Tiny Hunyuan Autoencoder (taehv1_5) — 4x temporal compression, 8x spatial compression, 32 latent channels

Training Data

Waypoint-1.5 was trained on nearly 100× more data than Waypoint-1, with the release emphasizing better coherence, motion consistency, and broader hardware accessibility.

Limitations

This model has important limitations.

It is a generative world model, not a simulator with guaranteed physical accuracy.
Long interactive rollouts may drift, collapse, or become inconsistent.
The model may produce unstable geometry, object persistence failures, or implausible motion.
Performance is hardware-dependent and may vary significantly by runtime stack and settings.
Safety mitigations available in hosted deployments may not transfer fully to raw checkpoint use.
Outputs may reflect biases, omissions, or unsafe patterns present in training data or learned world priors.