| --- |
| license: mit |
| tags: |
| - worldkit |
| - world-model |
| - jepa |
| - robotics |
| - push-t |
| - planning |
| library_name: worldkit |
| pipeline_tag: reinforcement-learning |
| --- |
| |
| # WorldKit / pusht-base |
|
|
| A **base** world model trained on the Push-T task using [WorldKit](https://github.com/DilpreetBansi/worldkit). |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |----------|-------| |
| | Architecture | JEPA (Joint-Embedding Predictive Architecture) | |
| | Config | `base` | |
| | Parameters | 13M | |
| | Latent Dim | 192 | |
| | Image Size | 96x96 | |
| | Action Dim | 2 (dx, dy) | |
| | File Size | 50.2 MB | |
| | Training Time | 2 minutes (Apple M4 Pro, MPS) | |
| | Best Val Loss | 0.3500 | |
|
|
| ## Usage |
|
|
| ```bash |
| pip install worldkit |
| ``` |
|
|
| ```python |
| from worldkit import WorldModel |
| |
| # Load this model |
| model = WorldModel.from_hub("DilpreetBansi/pusht-base") |
| |
| # Encode an observation |
| z = model.encode(observation) # -> (192,) latent vector |
| |
| # Predict future states |
| result = model.predict(current_frame, actions) |
| |
| # Plan to reach a goal |
| plan = model.plan(current_frame, goal_frame, max_steps=50) |
| |
| # Score physical plausibility |
| score = model.plausibility(video_frames) |
| ``` |
|
|
| ## Task: Push-T |
|
|
| The Push-T task is a 2D manipulation environment where an agent (blue circle) pushes a T-shaped block (red) toward a target position. Observations are 96x96 RGB images and actions are 2D continuous (dx, dy). |
|
|
| ## Training |
|
|
| Trained using WorldKit's built-in training pipeline: |
|
|
| ```python |
| from worldkit import WorldModel |
| |
| model = WorldModel.train( |
| data="pusht_train.h5", |
| config="base", |
| epochs=50, |
| batch_size=32, |
| lr=3e-4, |
| lambda_reg=0.5, |
| action_dim=2, |
| ) |
| ``` |
|
|
| ## Architecture |
|
|
| Based on the LeWorldModel paper (Maes et al., 2026): |
| - **Encoder**: Vision Transformer (ViT) with CLS token pooling |
| - **Predictor**: Transformer with AdaLN-Zero conditioning on actions |
| - **Loss**: L_pred + lambda * SIGReg(Z) |
| - **Planner**: Cross-Entropy Method (CEM) in latent space |
| |
| ## Citation |
| |
| If you use this model, please cite WorldKit and the LeWorldModel paper: |
| |
| ```bibtex |
| @software{worldkit, |
| title = {WorldKit: The Open-Source World Model Runtime}, |
| author = {Bansi, Dilpreet}, |
| year = {2026}, |
| url = {https://github.com/DilpreetBansi/worldkit} |
| } |
| ``` |
| |
| ## License |
| |
| MIT License. See [WorldKit LICENSE](https://github.com/DilpreetBansi/worldkit/blob/main/LICENSE). |
| |
| --- |
| |
| Built with [WorldKit](https://github.com/DilpreetBansi/worldkit) by [Dilpreet Bansi](https://github.com/DilpreetBansi). |
| |