WorldKit / cartpole-base

A base CartPole-v1 world model trained with WorldKit.

Model Details

Property Value
Architecture JEPA (Joint-Embedding Predictive Architecture)
Config base
Parameters 13M
Latent Dim 192
Task CartPole balance control
Training 100 epochs on 200 episodes of pixel observations
Best Val Loss 0.2958

Usage

pip install worldkit
from worldkit import WorldModel

# Load this model
model = WorldModel.from_hub("DilpreetBansi/cartpole-base")

# Encode an observation
z = model.encode(observation)  # -> (192,) latent vector

# Predict future states
result = model.predict(current_frame, actions)

# Plan to reach a goal
plan = model.plan(current_frame, goal_frame, max_steps=50)

# Score physical plausibility
score = model.plausibility(video_frames)

Task: CartPole-v1

The CartPole-v1 environment requires an agent to balance a pole on a cart by applying left/right forces. The world model learns to predict future visual observations from pixel inputs, enabling planning and control in latent space.

Training

Trained using WorldKit's built-in training pipeline on 200 episodes of pixel observations for 100 epochs:

from worldkit import WorldModel

model = WorldModel.train(
    data="cartpole_train.h5",
    config="base",
    epochs=100,
    batch_size=32,
    lr=3e-4,
    lambda_reg=0.5,
)

Architecture

Based on the LeWorldModel paper (Maes et al., 2026):

  • Encoder: Vision Transformer (ViT) with CLS token pooling
  • Predictor: Transformer with AdaLN-Zero conditioning on actions
  • Loss: L_pred + lambda * SIGReg(Z)
  • Planner: Cross-Entropy Method (CEM) in latent space

Links

Citation

If you use this model, please cite WorldKit:

@software{worldkit,
  title = {WorldKit: The Open-Source World Model Runtime},
  author = {Bansi, Dilpreet},
  year = {2026},
  url = {https://github.com/DilpreetBansi/worldkit}
}

License

MIT License. See WorldKit LICENSE.


Built with WorldKit | PyPI | GitHub

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading