File size: 5,292 Bytes

---
license: cc-by-nc-4.0
tags:
- robotics
- world-model
- jepa
- planning
- pytorch
library_name: pytorch
pipeline_tag: robotics
datasets:
- facebook/jepa-wms
arxiv: "2512.24497"
---

<h1 align="center">
    <p>🤖 <b>JEPA-WMs Pretrained Models</b></p>
</h1>

<div align="center" style="line-height: 1;">
  <a href="https://github.com/facebookresearch/jepa-wms" target="_blank" style="margin: 2px;"><img alt="Github" src="https://img.shields.io/badge/Github-facebookresearch%2Fjepa--wms-black?logo=github" style="display: inline-block; vertical-align: middle;"/></a>
  <a href="https://huggingface.co/facebook/jepa-wms" target="_blank" style="margin: 2px;"><img alt="HuggingFace" src="https://img.shields.io/badge/🤗%20HuggingFace-facebook%2Fjepa--wms-ffc107" style="display: inline-block; vertical-align: middle;"/></a>
  <a href="https://arxiv.org/abs/2512.24497" target="_blank" style="margin: 2px;"><img alt="ArXiv" src="https://img.shields.io/badge/arXiv-2512.24497-b5212f?logo=arxiv" style="display: inline-block; vertical-align: middle;"/></a>
</div>

<br>

<p align="center">
  <b><a href="https://ai.facebook.com/research/">Meta AI Research, FAIR</a></b>
</p>

<p align="center">
  This 🤗 HuggingFace repository hosts pretrained <b>JEPA-WM</b> world models.<br>
  👉 See the <a href="https://github.com/facebookresearch/jepa-wms">main repository</a> for training code and datasets.
</p>

This repository contains pretrained world model checkpoints from the paper
["What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?"](https://arxiv.org/abs/2512.24497)

## Available Models

### JEPA-WM Models

| Model | Environment | Resolution | Encoder | Pred. Depth |
|-------|-------------|------------|---------|-------------|
| `jepa_wm_droid` | DROID & RoboCasa | 256×256 | DINOv3 ViT-L/16 | 12 |
| `jepa_wm_metaworld` | Metaworld | 224×224 | DINOv2 ViT-S/14 | 6 |
| `jepa_wm_pusht` | Push-T | 224×224 | DINOv2 ViT-S/14 | 6 |
| `jepa_wm_pointmaze` | PointMaze | 224×224 | DINOv2 ViT-S/14 | 6 |
| `jepa_wm_wall` | Wall | 224×224 | DINOv2 ViT-S/14 | 6 |

### DINO-WM Baseline Models

| Model | Environment | Resolution | Encoder | Pred. Depth |
|-------|-------------|------------|---------|-------------|
| `dino_wm_droid` | DROID & RoboCasa | 224×224 | DINOv2 ViT-S/14 | 6 |
| `dino_wm_metaworld` | Metaworld | 224×224 | DINOv2 ViT-S/14 | 6 |
| `dino_wm_pusht` | Push-T | 224×224 | DINOv2 ViT-S/14 | 6 |
| `dino_wm_pointmaze` | PointMaze | 224×224 | DINOv2 ViT-S/14 | 6 |
| `dino_wm_wall` | Wall | 224×224 | DINOv2 ViT-S/14 | 6 |

### V-JEPA-2-AC Baseline Models

| Model | Environment | Resolution | Encoder | Pred. Depth |
|-------|-------------|------------|---------|-------------|
| `vjepa2_ac_droid` | DROID & RoboCasa | 256×256 | V-JEPA-2 ViT-G/16 | 24 |
| `vjepa2_ac_oss` | DROID & RoboCasa | 256×256 | V-JEPA-2 ViT-G/16 | 24 |

### VM2M Decoder Heads

| Model | Encoder | Resolution |
|-------|---------|------------|
| `dinov2_vits_224` | DINOv2 ViT-S/14 | 224×224 |
| `dinov2_vits_224_INet` | DINOv2 ViT-S/14 | 224×224 |
| `dinov3_vitl_256_INet` | DINOv3 ViT-L/16 | 256×256 |
| `vjepa2_vitg_256_INet` | V-JEPA-2 ViT-G/16 | 256×256 |

## Usage

### Via PyTorch Hub (Recommended)

```python
import torch

# Load JEPA-WM models
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_droid')
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_metaworld')

# Load DINO-WM baselines
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'dino_wm_metaworld')

# Load V-JEPA-2-AC baseline
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'vjepa2_ac_droid')
```

### Via Hugging Face Hub

```python
from huggingface_hub import hf_hub_download
import torch

# Download a specific checkpoint
checkpoint_path = hf_hub_download(
    repo_id="facebook/jepa-wms",
    filename="jepa_wm_droid.pth.tar"
)

# Load checkpoint (contains 'encoder', 'predictor', and 'heads' state dicts)
checkpoint = torch.load(checkpoint_path, map_location="cpu")
print(checkpoint.keys())  # dict_keys(['encoder', 'predictor', 'heads', 'opt', 'scaler', 'epoch', 'batch_size', 'lr', 'amp'])
```

> **Note**: This only downloads the weights. To instantiate the full model with the correct
> architecture and load the weights, we recommend using PyTorch Hub (see above) or cloning the
> [jepa-wms repository](https://github.com/facebookresearch/jepa-wms) and using the training/eval scripts.

## Citation

```bibtex
@misc{terver2025drivessuccessphysicalplanning,
      title={What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?},
      author={Basile Terver and Tsung-Yen Yang and Jean Ponce and Adrien Bardes and Yann LeCun},
      year={2025},
      eprint={2512.24497},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2512.24497},
}
```

## License

These models are licensed under [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).

## Links

- 📄 [Paper](https://arxiv.org/abs/2512.24497)
- 💻 [GitHub Repository](https://github.com/facebookresearch/jepa-wms)
- 🤗 [Datasets](https://huggingface.co/datasets/facebook/jepa-wms)
- 🤗 [Models](https://huggingface.co/facebook/jepa-wms)