pusht-base / README.md
DilpreetBansi's picture
Upload README.md with huggingface_hub
29baee1 verified
---
license: mit
tags:
- worldkit
- world-model
- jepa
- robotics
- push-t
- planning
library_name: worldkit
pipeline_tag: reinforcement-learning
---
# WorldKit / pusht-base
A **base** world model trained on the Push-T task using [WorldKit](https://github.com/DilpreetBansi/worldkit).
## Model Details
| Property | Value |
|----------|-------|
| Architecture | JEPA (Joint-Embedding Predictive Architecture) |
| Config | `base` |
| Parameters | 13M |
| Latent Dim | 192 |
| Image Size | 96x96 |
| Action Dim | 2 (dx, dy) |
| File Size | 50.2 MB |
| Training Time | 2 minutes (Apple M4 Pro, MPS) |
| Best Val Loss | 0.3500 |
## Usage
```bash
pip install worldkit
```
```python
from worldkit import WorldModel
# Load this model
model = WorldModel.from_hub("DilpreetBansi/pusht-base")
# Encode an observation
z = model.encode(observation) # -> (192,) latent vector
# Predict future states
result = model.predict(current_frame, actions)
# Plan to reach a goal
plan = model.plan(current_frame, goal_frame, max_steps=50)
# Score physical plausibility
score = model.plausibility(video_frames)
```
## Task: Push-T
The Push-T task is a 2D manipulation environment where an agent (blue circle) pushes a T-shaped block (red) toward a target position. Observations are 96x96 RGB images and actions are 2D continuous (dx, dy).
## Training
Trained using WorldKit's built-in training pipeline:
```python
from worldkit import WorldModel
model = WorldModel.train(
data="pusht_train.h5",
config="base",
epochs=50,
batch_size=32,
lr=3e-4,
lambda_reg=0.5,
action_dim=2,
)
```
## Architecture
Based on the LeWorldModel paper (Maes et al., 2026):
- **Encoder**: Vision Transformer (ViT) with CLS token pooling
- **Predictor**: Transformer with AdaLN-Zero conditioning on actions
- **Loss**: L_pred + lambda * SIGReg(Z)
- **Planner**: Cross-Entropy Method (CEM) in latent space
## Citation
If you use this model, please cite WorldKit and the LeWorldModel paper:
```bibtex
@software{worldkit,
title = {WorldKit: The Open-Source World Model Runtime},
author = {Bansi, Dilpreet},
year = {2026},
url = {https://github.com/DilpreetBansi/worldkit}
}
```
## License
MIT License. See [WorldKit LICENSE](https://github.com/DilpreetBansi/worldkit/blob/main/LICENSE).
---
Built with [WorldKit](https://github.com/DilpreetBansi/worldkit) by [Dilpreet Bansi](https://github.com/DilpreetBansi).