metadata
license: cc-by-nc-4.0
tags:
- robotics
- world-model
- jepa
- planning
- pytorch
library_name: pytorch
pipeline_tag: robotics
datasets:
- facebook/jepa-wms
arxiv: '2512.24497'
π€ JEPA-WMs Pretrained Models
This π€ HuggingFace repository hosts pretrained JEPA-WM world models.
π See the main repository for training code and datasets.
This repository contains pretrained world model checkpoints from the paper "What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?"
Available Models
JEPA-WM Models
| Model | Environment | Resolution | Encoder | Pred. Depth |
|---|---|---|---|---|
jepa_wm_droid |
DROID & RoboCasa | 256Γ256 | DINOv3 ViT-L/16 | 12 |
jepa_wm_metaworld |
Metaworld | 224Γ224 | DINOv2 ViT-S/14 | 6 |
jepa_wm_pusht |
Push-T | 224Γ224 | DINOv2 ViT-S/14 | 6 |
jepa_wm_pointmaze |
PointMaze | 224Γ224 | DINOv2 ViT-S/14 | 6 |
jepa_wm_wall |
Wall | 224Γ224 | DINOv2 ViT-S/14 | 6 |
DINO-WM Baseline Models
| Model | Environment | Resolution | Encoder | Pred. Depth |
|---|---|---|---|---|
dino_wm_droid |
DROID & RoboCasa | 224Γ224 | DINOv2 ViT-S/14 | 6 |
dino_wm_metaworld |
Metaworld | 224Γ224 | DINOv2 ViT-S/14 | 6 |
dino_wm_pusht |
Push-T | 224Γ224 | DINOv2 ViT-S/14 | 6 |
dino_wm_pointmaze |
PointMaze | 224Γ224 | DINOv2 ViT-S/14 | 6 |
dino_wm_wall |
Wall | 224Γ224 | DINOv2 ViT-S/14 | 6 |
V-JEPA-2-AC Baseline Models
| Model | Environment | Resolution | Encoder | Pred. Depth |
|---|---|---|---|---|
vjepa2_ac_droid |
DROID & RoboCasa | 256Γ256 | V-JEPA-2 ViT-G/16 | 24 |
vjepa2_ac_oss |
DROID & RoboCasa | 256Γ256 | V-JEPA-2 ViT-G/16 | 24 |
VM2M Decoder Heads
| Model | Encoder | Resolution |
|---|---|---|
dinov2_vits_224 |
DINOv2 ViT-S/14 | 224Γ224 |
dinov2_vits_224_INet |
DINOv2 ViT-S/14 | 224Γ224 |
dinov3_vitl_256_INet |
DINOv3 ViT-L/16 | 256Γ256 |
vjepa2_vitg_256_INet |
V-JEPA-2 ViT-G/16 | 256Γ256 |
Usage
Via PyTorch Hub (Recommended)
import torch
# Load JEPA-WM models
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_droid')
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_metaworld')
# Load DINO-WM baselines
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'dino_wm_metaworld')
# Load V-JEPA-2-AC baseline
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'vjepa2_ac_droid')
Via Hugging Face Hub
from huggingface_hub import hf_hub_download
import torch
# Download a specific checkpoint
checkpoint_path = hf_hub_download(
repo_id="facebook/jepa-wms",
filename="jepa_wm_droid.pth.tar"
)
# Load checkpoint (contains 'encoder', 'predictor', and 'heads' state dicts)
checkpoint = torch.load(checkpoint_path, map_location="cpu")
print(checkpoint.keys()) # dict_keys(['encoder', 'predictor', 'heads', 'opt', 'scaler', 'epoch', 'batch_size', 'lr', 'amp'])
Note: This only downloads the weights. To instantiate the full model with the correct architecture and load the weights, we recommend using PyTorch Hub (see above) or cloning the jepa-wms repository and using the training/eval scripts.
Citation
@misc{terver2025drivessuccessphysicalplanning,
title={What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?},
author={Basile Terver and Tsung-Yen Yang and Jean Ponce and Adrien Bardes and Yann LeCun},
year={2025},
eprint={2512.24497},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2512.24497},
}
License
These models are licensed under CC-BY-NC 4.0.
Links
- π Paper
- π» GitHub Repository
- π€ Datasets
- π€ Models