|
|
--- |
|
|
license: cc-by-nc-4.0 |
|
|
tags: |
|
|
- robotics |
|
|
- world-model |
|
|
- jepa |
|
|
- planning |
|
|
- pytorch |
|
|
library_name: pytorch |
|
|
pipeline_tag: robotics |
|
|
datasets: |
|
|
- facebook/jepa-wms |
|
|
arxiv: "2512.24497" |
|
|
--- |
|
|
|
|
|
<h1 align="center"> |
|
|
<p>π€ <b>JEPA-WMs Pretrained Models</b></p> |
|
|
</h1> |
|
|
|
|
|
<div align="center" style="line-height: 1;"> |
|
|
<a href="https://github.com/facebookresearch/jepa-wms" target="_blank" style="margin: 2px;"><img alt="Github" src="https://img.shields.io/badge/Github-facebookresearch%2Fjepa--wms-black?logo=github" style="display: inline-block; vertical-align: middle;"/></a> |
|
|
<a href="https://huggingface.co/facebook/jepa-wms" target="_blank" style="margin: 2px;"><img alt="HuggingFace" src="https://img.shields.io/badge/π€%20HuggingFace-facebook%2Fjepa--wms-ffc107" style="display: inline-block; vertical-align: middle;"/></a> |
|
|
<a href="https://arxiv.org/abs/2512.24497" target="_blank" style="margin: 2px;"><img alt="ArXiv" src="https://img.shields.io/badge/arXiv-2512.24497-b5212f?logo=arxiv" style="display: inline-block; vertical-align: middle;"/></a> |
|
|
</div> |
|
|
|
|
|
<br> |
|
|
|
|
|
<p align="center"> |
|
|
<b><a href="https://ai.facebook.com/research/">Meta AI Research, FAIR</a></b> |
|
|
</p> |
|
|
|
|
|
<p align="center"> |
|
|
This π€ HuggingFace repository hosts pretrained <b>JEPA-WM</b> world models.<br> |
|
|
π See the <a href="https://github.com/facebookresearch/jepa-wms">main repository</a> for training code and datasets. |
|
|
</p> |
|
|
|
|
|
This repository contains pretrained world model checkpoints from the paper |
|
|
["What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?"](https://arxiv.org/abs/2512.24497) |
|
|
|
|
|
## Available Models |
|
|
|
|
|
### JEPA-WM Models |
|
|
|
|
|
| Model | Environment | Resolution | Encoder | Pred. Depth | |
|
|
|-------|-------------|------------|---------|-------------| |
|
|
| `jepa_wm_droid` | DROID & RoboCasa | 256Γ256 | DINOv3 ViT-L/16 | 12 | |
|
|
| `jepa_wm_metaworld` | Metaworld | 224Γ224 | DINOv2 ViT-S/14 | 6 | |
|
|
| `jepa_wm_pusht` | Push-T | 224Γ224 | DINOv2 ViT-S/14 | 6 | |
|
|
| `jepa_wm_pointmaze` | PointMaze | 224Γ224 | DINOv2 ViT-S/14 | 6 | |
|
|
| `jepa_wm_wall` | Wall | 224Γ224 | DINOv2 ViT-S/14 | 6 | |
|
|
|
|
|
### DINO-WM Baseline Models |
|
|
|
|
|
| Model | Environment | Resolution | Encoder | Pred. Depth | |
|
|
|-------|-------------|------------|---------|-------------| |
|
|
| `dino_wm_droid` | DROID & RoboCasa | 224Γ224 | DINOv2 ViT-S/14 | 6 | |
|
|
| `dino_wm_metaworld` | Metaworld | 224Γ224 | DINOv2 ViT-S/14 | 6 | |
|
|
| `dino_wm_pusht` | Push-T | 224Γ224 | DINOv2 ViT-S/14 | 6 | |
|
|
| `dino_wm_pointmaze` | PointMaze | 224Γ224 | DINOv2 ViT-S/14 | 6 | |
|
|
| `dino_wm_wall` | Wall | 224Γ224 | DINOv2 ViT-S/14 | 6 | |
|
|
|
|
|
### V-JEPA-2-AC Baseline Models |
|
|
|
|
|
| Model | Environment | Resolution | Encoder | Pred. Depth | |
|
|
|-------|-------------|------------|---------|-------------| |
|
|
| `vjepa2_ac_droid` | DROID & RoboCasa | 256Γ256 | V-JEPA-2 ViT-G/16 | 24 | |
|
|
| `vjepa2_ac_oss` | DROID & RoboCasa | 256Γ256 | V-JEPA-2 ViT-G/16 | 24 | |
|
|
|
|
|
### VM2M Decoder Heads |
|
|
|
|
|
| Model | Encoder | Resolution | |
|
|
|-------|---------|------------| |
|
|
| `dinov2_vits_224` | DINOv2 ViT-S/14 | 224Γ224 | |
|
|
| `dinov2_vits_224_INet` | DINOv2 ViT-S/14 | 224Γ224 | |
|
|
| `dinov3_vitl_256_INet` | DINOv3 ViT-L/16 | 256Γ256 | |
|
|
| `vjepa2_vitg_256_INet` | V-JEPA-2 ViT-G/16 | 256Γ256 | |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Via PyTorch Hub (Recommended) |
|
|
|
|
|
```python |
|
|
import torch |
|
|
|
|
|
# Load JEPA-WM models |
|
|
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_droid') |
|
|
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_metaworld') |
|
|
|
|
|
# Load DINO-WM baselines |
|
|
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'dino_wm_metaworld') |
|
|
|
|
|
# Load V-JEPA-2-AC baseline |
|
|
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'vjepa2_ac_droid') |
|
|
``` |
|
|
|
|
|
### Via Hugging Face Hub |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
import torch |
|
|
|
|
|
# Download a specific checkpoint |
|
|
checkpoint_path = hf_hub_download( |
|
|
repo_id="facebook/jepa-wms", |
|
|
filename="jepa_wm_droid.pth.tar" |
|
|
) |
|
|
|
|
|
# Load checkpoint (contains 'encoder', 'predictor', and 'heads' state dicts) |
|
|
checkpoint = torch.load(checkpoint_path, map_location="cpu") |
|
|
print(checkpoint.keys()) # dict_keys(['encoder', 'predictor', 'heads', 'opt', 'scaler', 'epoch', 'batch_size', 'lr', 'amp']) |
|
|
``` |
|
|
|
|
|
> **Note**: This only downloads the weights. To instantiate the full model with the correct |
|
|
> architecture and load the weights, we recommend using PyTorch Hub (see above) or cloning the |
|
|
> [jepa-wms repository](https://github.com/facebookresearch/jepa-wms) and using the training/eval scripts. |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{terver2025drivessuccessphysicalplanning, |
|
|
title={What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?}, |
|
|
author={Basile Terver and Tsung-Yen Yang and Jean Ponce and Adrien Bardes and Yann LeCun}, |
|
|
year={2025}, |
|
|
eprint={2512.24497}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.AI}, |
|
|
url={https://arxiv.org/abs/2512.24497}, |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
These models are licensed under [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). |
|
|
|
|
|
## Links |
|
|
|
|
|
- π [Paper](https://arxiv.org/abs/2512.24497) |
|
|
- π» [GitHub Repository](https://github.com/facebookresearch/jepa-wms) |
|
|
- π€ [Datasets](https://huggingface.co/datasets/facebook/jepa-wms) |
|
|
- π€ [Models](https://huggingface.co/facebook/jepa-wms) |
|
|
|