Robotics
PyTorch
world-model
jepa
planning
jepa-wms / README.md
Basile-Terv's picture
Fix README.md
074000e
---
license: cc-by-nc-4.0
tags:
- robotics
- world-model
- jepa
- planning
- pytorch
library_name: pytorch
pipeline_tag: robotics
datasets:
- facebook/jepa-wms
arxiv: "2512.24497"
---
<h1 align="center">
<p>πŸ€– <b>JEPA-WMs Pretrained Models</b></p>
</h1>
<div align="center" style="line-height: 1;">
<a href="https://github.com/facebookresearch/jepa-wms" target="_blank" style="margin: 2px;"><img alt="Github" src="https://img.shields.io/badge/Github-facebookresearch%2Fjepa--wms-black?logo=github" style="display: inline-block; vertical-align: middle;"/></a>
<a href="https://huggingface.co/facebook/jepa-wms" target="_blank" style="margin: 2px;"><img alt="HuggingFace" src="https://img.shields.io/badge/πŸ€—%20HuggingFace-facebook%2Fjepa--wms-ffc107" style="display: inline-block; vertical-align: middle;"/></a>
<a href="https://arxiv.org/abs/2512.24497" target="_blank" style="margin: 2px;"><img alt="ArXiv" src="https://img.shields.io/badge/arXiv-2512.24497-b5212f?logo=arxiv" style="display: inline-block; vertical-align: middle;"/></a>
</div>
<br>
<p align="center">
<b><a href="https://ai.facebook.com/research/">Meta AI Research, FAIR</a></b>
</p>
<p align="center">
This πŸ€— HuggingFace repository hosts pretrained <b>JEPA-WM</b> world models.<br>
πŸ‘‰ See the <a href="https://github.com/facebookresearch/jepa-wms">main repository</a> for training code and datasets.
</p>
This repository contains pretrained world model checkpoints from the paper
["What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?"](https://arxiv.org/abs/2512.24497)
## Available Models
### JEPA-WM Models
| Model | Environment | Resolution | Encoder | Pred. Depth |
|-------|-------------|------------|---------|-------------|
| `jepa_wm_droid` | DROID & RoboCasa | 256Γ—256 | DINOv3 ViT-L/16 | 12 |
| `jepa_wm_metaworld` | Metaworld | 224Γ—224 | DINOv2 ViT-S/14 | 6 |
| `jepa_wm_pusht` | Push-T | 224Γ—224 | DINOv2 ViT-S/14 | 6 |
| `jepa_wm_pointmaze` | PointMaze | 224Γ—224 | DINOv2 ViT-S/14 | 6 |
| `jepa_wm_wall` | Wall | 224Γ—224 | DINOv2 ViT-S/14 | 6 |
### DINO-WM Baseline Models
| Model | Environment | Resolution | Encoder | Pred. Depth |
|-------|-------------|------------|---------|-------------|
| `dino_wm_droid` | DROID & RoboCasa | 224Γ—224 | DINOv2 ViT-S/14 | 6 |
| `dino_wm_metaworld` | Metaworld | 224Γ—224 | DINOv2 ViT-S/14 | 6 |
| `dino_wm_pusht` | Push-T | 224Γ—224 | DINOv2 ViT-S/14 | 6 |
| `dino_wm_pointmaze` | PointMaze | 224Γ—224 | DINOv2 ViT-S/14 | 6 |
| `dino_wm_wall` | Wall | 224Γ—224 | DINOv2 ViT-S/14 | 6 |
### V-JEPA-2-AC Baseline Models
| Model | Environment | Resolution | Encoder | Pred. Depth |
|-------|-------------|------------|---------|-------------|
| `vjepa2_ac_droid` | DROID & RoboCasa | 256Γ—256 | V-JEPA-2 ViT-G/16 | 24 |
| `vjepa2_ac_oss` | DROID & RoboCasa | 256Γ—256 | V-JEPA-2 ViT-G/16 | 24 |
### VM2M Decoder Heads
| Model | Encoder | Resolution |
|-------|---------|------------|
| `dinov2_vits_224` | DINOv2 ViT-S/14 | 224Γ—224 |
| `dinov2_vits_224_INet` | DINOv2 ViT-S/14 | 224Γ—224 |
| `dinov3_vitl_256_INet` | DINOv3 ViT-L/16 | 256Γ—256 |
| `vjepa2_vitg_256_INet` | V-JEPA-2 ViT-G/16 | 256Γ—256 |
## Usage
### Via PyTorch Hub (Recommended)
```python
import torch
# Load JEPA-WM models
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_droid')
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_metaworld')
# Load DINO-WM baselines
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'dino_wm_metaworld')
# Load V-JEPA-2-AC baseline
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'vjepa2_ac_droid')
```
### Via Hugging Face Hub
```python
from huggingface_hub import hf_hub_download
import torch
# Download a specific checkpoint
checkpoint_path = hf_hub_download(
repo_id="facebook/jepa-wms",
filename="jepa_wm_droid.pth.tar"
)
# Load checkpoint (contains 'encoder', 'predictor', and 'heads' state dicts)
checkpoint = torch.load(checkpoint_path, map_location="cpu")
print(checkpoint.keys()) # dict_keys(['encoder', 'predictor', 'heads', 'opt', 'scaler', 'epoch', 'batch_size', 'lr', 'amp'])
```
> **Note**: This only downloads the weights. To instantiate the full model with the correct
> architecture and load the weights, we recommend using PyTorch Hub (see above) or cloning the
> [jepa-wms repository](https://github.com/facebookresearch/jepa-wms) and using the training/eval scripts.
## Citation
```bibtex
@misc{terver2025drivessuccessphysicalplanning,
title={What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?},
author={Basile Terver and Tsung-Yen Yang and Jean Ponce and Adrien Bardes and Yann LeCun},
year={2025},
eprint={2512.24497},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2512.24497},
}
```
## License
These models are licensed under [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
## Links
- πŸ“„ [Paper](https://arxiv.org/abs/2512.24497)
- πŸ’» [GitHub Repository](https://github.com/facebookresearch/jepa-wms)
- πŸ€— [Datasets](https://huggingface.co/datasets/facebook/jepa-wms)
- πŸ€— [Models](https://huggingface.co/facebook/jepa-wms)