WorldKit / cartpole-base
A base CartPole-v1 world model trained with WorldKit.
Model Details
| Property | Value |
|---|---|
| Architecture | JEPA (Joint-Embedding Predictive Architecture) |
| Config | base |
| Parameters | 13M |
| Latent Dim | 192 |
| Task | CartPole balance control |
| Training | 100 epochs on 200 episodes of pixel observations |
| Best Val Loss | 0.2958 |
Usage
pip install worldkit
from worldkit import WorldModel
# Load this model
model = WorldModel.from_hub("DilpreetBansi/cartpole-base")
# Encode an observation
z = model.encode(observation) # -> (192,) latent vector
# Predict future states
result = model.predict(current_frame, actions)
# Plan to reach a goal
plan = model.plan(current_frame, goal_frame, max_steps=50)
# Score physical plausibility
score = model.plausibility(video_frames)
Task: CartPole-v1
The CartPole-v1 environment requires an agent to balance a pole on a cart by applying left/right forces. The world model learns to predict future visual observations from pixel inputs, enabling planning and control in latent space.
Training
Trained using WorldKit's built-in training pipeline on 200 episodes of pixel observations for 100 epochs:
from worldkit import WorldModel
model = WorldModel.train(
data="cartpole_train.h5",
config="base",
epochs=100,
batch_size=32,
lr=3e-4,
lambda_reg=0.5,
)
Architecture
Based on the LeWorldModel paper (Maes et al., 2026):
- Encoder: Vision Transformer (ViT) with CLS token pooling
- Predictor: Transformer with AdaLN-Zero conditioning on actions
- Loss: L_pred + lambda * SIGReg(Z)
- Planner: Cross-Entropy Method (CEM) in latent space
Links
- PyPI: pypi.org/project/worldkit
- GitHub: github.com/DilpreetBansi/worldkit
Citation
If you use this model, please cite WorldKit:
@software{worldkit,
title = {WorldKit: The Open-Source World Model Runtime},
author = {Bansi, Dilpreet},
year = {2026},
url = {https://github.com/DilpreetBansi/worldkit}
}
License
MIT License. See WorldKit LICENSE.