Robotics
PyTorch
world-model
jepa
planning
Basile-Terv commited on
Commit
63aa637
·
verified ·
1 Parent(s): fce4d26

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +114 -3
README.md CHANGED
@@ -1,3 +1,114 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - robotics
5
+ - world-model
6
+ - jepa
7
+ - planning
8
+ - pytorch
9
+ library_name: pytorch
10
+ pipeline_tag: robotics
11
+ datasets:
12
+ - facebook/jepa-wms
13
+ ---
14
+
15
+ # JEPA-WMs: Pretrained World Models
16
+
17
+ This repository contains pretrained world model checkpoints from the paper
18
+ ["What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?"](https://arxiv.org/abs/2512.24497)
19
+
20
+ ## Available Models
21
+
22
+ ### JEPA-WM Models
23
+
24
+ | Model | Environment | Resolution | Encoder | Pred. Depth |
25
+ |-------|-------------|------------|---------|-------------|
26
+ | `jepa_wm_droid` | DROID & RoboCasa | 256×256 | DINOv3 ViT-L/16 | 12 |
27
+ | `jepa_wm_metaworld` | Metaworld | 224×224 | DINOv2 ViT-S/14 | 6 |
28
+ | `jepa_wm_pusht` | Push-T | 224×224 | DINOv2 ViT-S/14 | 6 |
29
+ | `jepa_wm_pointmaze` | PointMaze | 224×224 | DINOv2 ViT-S/14 | 6 |
30
+ | `jepa_wm_wall` | Wall | 224×224 | DINOv2 ViT-S/14 | 6 |
31
+
32
+ ### DINO-WM Baseline Models
33
+
34
+ | Model | Environment | Resolution | Encoder | Pred. Depth |
35
+ |-------|-------------|------------|---------|-------------|
36
+ | `dino_wm_droid` | DROID & RoboCasa | 224×224 | DINOv2 ViT-S/14 | 6 |
37
+ | `dino_wm_metaworld` | Metaworld | 224×224 | DINOv2 ViT-S/14 | 6 |
38
+ | `dino_wm_pusht` | Push-T | 224×224 | DINOv2 ViT-S/14 | 6 |
39
+ | `dino_wm_pointmaze` | PointMaze | 224×224 | DINOv2 ViT-S/14 | 6 |
40
+ | `dino_wm_wall` | Wall | 224×224 | DINOv2 ViT-S/14 | 6 |
41
+
42
+ ### V-JEPA-2-AC Baseline Models
43
+
44
+ | Model | Environment | Resolution | Encoder | Pred. Depth |
45
+ |-------|-------------|------------|---------|-------------|
46
+ | `vjepa2_ac_droid` | DROID & RoboCasa | 256×256 | V-JEPA-2 ViT-G/16 | 24 |
47
+ | `vjepa2_ac_oss` | DROID & RoboCasa | 256×256 | V-JEPA-2 ViT-G/16 | 24 |
48
+
49
+ ### VM2M Decoder Heads
50
+
51
+ | Model | Encoder | Resolution |
52
+ |-------|---------|------------|
53
+ | `dinov2_vits_224` | DINOv2 ViT-S/14 | 224×224 |
54
+ | `dinov2_vits_224_INet` | DINOv2 ViT-S/14 | 224×224 |
55
+ | `dinov3_vitl_256_INet` | DINOv3 ViT-L/16 | 256×256 |
56
+ | `vjepa2_vitg_256_INet` | V-JEPA-2 ViT-G/16 | 256×256 |
57
+
58
+ ## Usage
59
+
60
+ ### Via PyTorch Hub (Recommended)
61
+
62
+ ```python
63
+ import torch
64
+
65
+ # Load JEPA-WM models
66
+ model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_droid')
67
+ model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_metaworld')
68
+
69
+ # Load DINO-WM baselines
70
+ model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'dino_wm_metaworld')
71
+
72
+ # Load V-JEPA-2-AC baseline
73
+ model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'vjepa2_ac_droid')
74
+ ```
75
+
76
+ ### Via Hugging Face Hub
77
+
78
+ ```python
79
+ from huggingface_hub import hf_hub_download
80
+ import torch
81
+
82
+ # Download a specific checkpoint
83
+ checkpoint_path = hf_hub_download(
84
+ repo_id="facebook/jepa-wms",
85
+ filename="jepa_wm_droid.pth.tar"
86
+ )
87
+
88
+ # Load with PyTorch
89
+ checkpoint = torch.load(checkpoint_path, map_location="cpu")
90
+ ```
91
+
92
+ ## Citation
93
+
94
+ ```bibtex
95
+ @misc{terver2025drivessuccessphysicalplanning,
96
+ title={What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?},
97
+ author={Basile Terver and Tsung-Yen Yang and Jean Ponce and Adrien Bardes and Yann LeCun},
98
+ year={2025},
99
+ eprint={2512.24497},
100
+ archivePrefix={arXiv},
101
+ primaryClass={cs.AI},
102
+ url={https://arxiv.org/abs/2512.24497},
103
+ }
104
+ ```
105
+
106
+ ## License
107
+
108
+ These models are licensed under [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
109
+
110
+ ## Links
111
+
112
+ - 📄 [Paper](https://arxiv.org/abs/2512.24497)
113
+ - 💻 [GitHub Repository](https://github.com/facebookresearch/jepa-wms)
114
+ - 🤗 [Dataset](https://huggingface.co/datasets/facebook/jepa-wms)