LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
Paper • 2603.19312 • Published • 46
Author: Santosh Jaiswal (@hellojais)
Base architecture: LeWM by Lucas Maes et al. (2025)
Training data: hellojais/billiards-worldmodel
Code: hellojais/le-wm
| File | embed_dim | λ (SIGReg) | Best epoch | val/pred_loss | Notes |
|---|---|---|---|---|---|
lewm_epoch_8_object.ckpt |
192 | 0.09 | 8 | 0.00946 | Best overall validation loss |
lewm_small_epoch_8_object.ckpt |
32 | 0.01 | 8 | 0.00280 | Best prediction accuracy (2.6× better) |
Trained on 4,000 episodes (971,321 frames) of 2D billiards gameplay. The model learned to predict future frame embeddings from current embeddings and actions — encoding billiards physics purely from pixels.
Probe results (lewm_small):
| Approach | Same-episode | Novel cross-episode |
|---|---|---|
| Pure JEPA embedding CEM | ❌ FAIL | ❌ FAIL |
| State-based hybrid CEM | ✅ SUCCESS (9 steps) | ✅ SUCCESS (13 steps) |
Pure JEPA planning failed due to uniform embedding geometry in this visually simple domain. See FINDINGS.md for complete analysis.
# Load checkpoint
import stable_worldmodel as swm
import torch
device = torch.device("mps") # or "cuda" or "cpu"
# Load the small model (recommended)
checkpoint = torch.load(
"lewm_small_epoch_8_object.ckpt",
map_location=device
)
Original LeWM architecture by:
Lucas Maes, Quentin Leroux, Gauthier Gidel, Glen Berseth
Mila / McGill University (2025)
arXiv:2603.19312