--- license: mit tags: - worldkit - world-model - jepa - robotics - push-t - planning library_name: worldkit pipeline_tag: reinforcement-learning --- # WorldKit / pusht-base A **base** world model trained on the Push-T task using [WorldKit](https://github.com/DilpreetBansi/worldkit). ## Model Details | Property | Value | |----------|-------| | Architecture | JEPA (Joint-Embedding Predictive Architecture) | | Config | `base` | | Parameters | 13M | | Latent Dim | 192 | | Image Size | 96x96 | | Action Dim | 2 (dx, dy) | | File Size | 50.2 MB | | Training Time | 2 minutes (Apple M4 Pro, MPS) | | Best Val Loss | 0.3500 | ## Usage ```bash pip install worldkit ``` ```python from worldkit import WorldModel # Load this model model = WorldModel.from_hub("DilpreetBansi/pusht-base") # Encode an observation z = model.encode(observation) # -> (192,) latent vector # Predict future states result = model.predict(current_frame, actions) # Plan to reach a goal plan = model.plan(current_frame, goal_frame, max_steps=50) # Score physical plausibility score = model.plausibility(video_frames) ``` ## Task: Push-T The Push-T task is a 2D manipulation environment where an agent (blue circle) pushes a T-shaped block (red) toward a target position. Observations are 96x96 RGB images and actions are 2D continuous (dx, dy). ## Training Trained using WorldKit's built-in training pipeline: ```python from worldkit import WorldModel model = WorldModel.train( data="pusht_train.h5", config="base", epochs=50, batch_size=32, lr=3e-4, lambda_reg=0.5, action_dim=2, ) ``` ## Architecture Based on the LeWorldModel paper (Maes et al., 2026): - **Encoder**: Vision Transformer (ViT) with CLS token pooling - **Predictor**: Transformer with AdaLN-Zero conditioning on actions - **Loss**: L_pred + lambda * SIGReg(Z) - **Planner**: Cross-Entropy Method (CEM) in latent space ## Citation If you use this model, please cite WorldKit and the LeWorldModel paper: ```bibtex @software{worldkit, title = {WorldKit: The Open-Source World Model Runtime}, author = {Bansi, Dilpreet}, year = {2026}, url = {https://github.com/DilpreetBansi/worldkit} } ``` ## License MIT License. See [WorldKit LICENSE](https://github.com/DilpreetBansi/worldkit/blob/main/LICENSE). --- Built with [WorldKit](https://github.com/DilpreetBansi/worldkit) by [Dilpreet Bansi](https://github.com/DilpreetBansi).