File size: 5,292 Bytes
63aa637 074000e 63aa637 a06a805 074000e a06a805 63aa637 a06a805 63aa637 a06a805 63aa637 a06a805 63aa637 a06a805 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | ---
license: cc-by-nc-4.0
tags:
- robotics
- world-model
- jepa
- planning
- pytorch
library_name: pytorch
pipeline_tag: robotics
datasets:
- facebook/jepa-wms
arxiv: "2512.24497"
---
<h1 align="center">
<p>π€ <b>JEPA-WMs Pretrained Models</b></p>
</h1>
<div align="center" style="line-height: 1;">
<a href="https://github.com/facebookresearch/jepa-wms" target="_blank" style="margin: 2px;"><img alt="Github" src="https://img.shields.io/badge/Github-facebookresearch%2Fjepa--wms-black?logo=github" style="display: inline-block; vertical-align: middle;"/></a>
<a href="https://huggingface.co/facebook/jepa-wms" target="_blank" style="margin: 2px;"><img alt="HuggingFace" src="https://img.shields.io/badge/π€%20HuggingFace-facebook%2Fjepa--wms-ffc107" style="display: inline-block; vertical-align: middle;"/></a>
<a href="https://arxiv.org/abs/2512.24497" target="_blank" style="margin: 2px;"><img alt="ArXiv" src="https://img.shields.io/badge/arXiv-2512.24497-b5212f?logo=arxiv" style="display: inline-block; vertical-align: middle;"/></a>
</div>
<br>
<p align="center">
<b><a href="https://ai.facebook.com/research/">Meta AI Research, FAIR</a></b>
</p>
<p align="center">
This π€ HuggingFace repository hosts pretrained <b>JEPA-WM</b> world models.<br>
π See the <a href="https://github.com/facebookresearch/jepa-wms">main repository</a> for training code and datasets.
</p>
This repository contains pretrained world model checkpoints from the paper
["What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?"](https://arxiv.org/abs/2512.24497)
## Available Models
### JEPA-WM Models
| Model | Environment | Resolution | Encoder | Pred. Depth |
|-------|-------------|------------|---------|-------------|
| `jepa_wm_droid` | DROID & RoboCasa | 256Γ256 | DINOv3 ViT-L/16 | 12 |
| `jepa_wm_metaworld` | Metaworld | 224Γ224 | DINOv2 ViT-S/14 | 6 |
| `jepa_wm_pusht` | Push-T | 224Γ224 | DINOv2 ViT-S/14 | 6 |
| `jepa_wm_pointmaze` | PointMaze | 224Γ224 | DINOv2 ViT-S/14 | 6 |
| `jepa_wm_wall` | Wall | 224Γ224 | DINOv2 ViT-S/14 | 6 |
### DINO-WM Baseline Models
| Model | Environment | Resolution | Encoder | Pred. Depth |
|-------|-------------|------------|---------|-------------|
| `dino_wm_droid` | DROID & RoboCasa | 224Γ224 | DINOv2 ViT-S/14 | 6 |
| `dino_wm_metaworld` | Metaworld | 224Γ224 | DINOv2 ViT-S/14 | 6 |
| `dino_wm_pusht` | Push-T | 224Γ224 | DINOv2 ViT-S/14 | 6 |
| `dino_wm_pointmaze` | PointMaze | 224Γ224 | DINOv2 ViT-S/14 | 6 |
| `dino_wm_wall` | Wall | 224Γ224 | DINOv2 ViT-S/14 | 6 |
### V-JEPA-2-AC Baseline Models
| Model | Environment | Resolution | Encoder | Pred. Depth |
|-------|-------------|------------|---------|-------------|
| `vjepa2_ac_droid` | DROID & RoboCasa | 256Γ256 | V-JEPA-2 ViT-G/16 | 24 |
| `vjepa2_ac_oss` | DROID & RoboCasa | 256Γ256 | V-JEPA-2 ViT-G/16 | 24 |
### VM2M Decoder Heads
| Model | Encoder | Resolution |
|-------|---------|------------|
| `dinov2_vits_224` | DINOv2 ViT-S/14 | 224Γ224 |
| `dinov2_vits_224_INet` | DINOv2 ViT-S/14 | 224Γ224 |
| `dinov3_vitl_256_INet` | DINOv3 ViT-L/16 | 256Γ256 |
| `vjepa2_vitg_256_INet` | V-JEPA-2 ViT-G/16 | 256Γ256 |
## Usage
### Via PyTorch Hub (Recommended)
```python
import torch
# Load JEPA-WM models
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_droid')
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_metaworld')
# Load DINO-WM baselines
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'dino_wm_metaworld')
# Load V-JEPA-2-AC baseline
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'vjepa2_ac_droid')
```
### Via Hugging Face Hub
```python
from huggingface_hub import hf_hub_download
import torch
# Download a specific checkpoint
checkpoint_path = hf_hub_download(
repo_id="facebook/jepa-wms",
filename="jepa_wm_droid.pth.tar"
)
# Load checkpoint (contains 'encoder', 'predictor', and 'heads' state dicts)
checkpoint = torch.load(checkpoint_path, map_location="cpu")
print(checkpoint.keys()) # dict_keys(['encoder', 'predictor', 'heads', 'opt', 'scaler', 'epoch', 'batch_size', 'lr', 'amp'])
```
> **Note**: This only downloads the weights. To instantiate the full model with the correct
> architecture and load the weights, we recommend using PyTorch Hub (see above) or cloning the
> [jepa-wms repository](https://github.com/facebookresearch/jepa-wms) and using the training/eval scripts.
## Citation
```bibtex
@misc{terver2025drivessuccessphysicalplanning,
title={What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?},
author={Basile Terver and Tsung-Yen Yang and Jean Ponce and Adrien Bardes and Yann LeCun},
year={2025},
eprint={2512.24497},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2512.24497},
}
```
## License
These models are licensed under [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
## Links
- π [Paper](https://arxiv.org/abs/2512.24497)
- π» [GitHub Repository](https://github.com/facebookresearch/jepa-wms)
- π€ [Datasets](https://huggingface.co/datasets/facebook/jepa-wms)
- π€ [Models](https://huggingface.co/facebook/jepa-wms)
|