File size: 12,784 Bytes
2587ef0
 
9c1f523
 
 
 
 
 
 
2587ef0
9c1f523
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
---
license: apache-2.0
tags:
  - robotics
  - navigation
  - visual-navigation
  - embodied-ai
  - onnx
pipeline_tag: robotics
---

# Navigation Model Zoo

A collection of vision-based navigation policies exported to **ONNX**, each wrapped in a small,
uniform Python inference API. Maintained by **Honglin He @ UCLA-VAIL**.

Every model takes a short history of RGB frames and predicts a local trajectory (and optionally a
distance-to-goal / arrival signal); a built-in PD controller turns the trajectory into `(v, ω)`
velocity commands. All models share the same wrapper interface so they can be swapped and
benchmarked without per-model glue code.

## Models

| Folder | Model / paper | Goal mode | Context | Input H×W | Waypoints | Weights |
|--------|---------------|-----------|:-------:|:---------:|:---------:|---------|
| [`GNM_GL_Official`](GNM_GL_Official) | [GNM](https://arxiv.org/abs/2210.03370) · ICRA 2023 | goal-free | 6 | 64×85 | 5 | `gnm_imagegoal.onnx` (+`.data`) · 35 MB |
| [`Vint_GL_Official`](Vint_GL_Official) | [ViNT](https://arxiv.org/abs/2306.14846) · CoRL 2023 | goal-free | 6 | 64×85 | 5 | `vint_imagegoal.onnx` (+`.data`) · 97 MB |
| [`NoMaD_GL_Official`](NoMaD_GL_Official) | [NoMaD](https://arxiv.org/abs/2310.07896) · ICRA 2024 | goal-free (diffusion) | 4 | 96×96 | 8 ×8 samples | 3× `.onnx` (+`.data`) · 111 MB |
| [`CityWalker_PG_Official`](CityWalker_PG_Official) | [CityWalker](https://arxiv.org/abs/2411.17820) · CVPR 2025 | point-goal | 5 | 350×630 | 5 | `citywalker.onnx` · 806 MB |
| [`MBRA_PG_Official`](MBRA_PG_Official) | [MBRA](https://arxiv.org/abs/2505.05592) · RA-L 2025 | point-goal | 6 | 96×96 | 8 | `mbra.onnx` · 254 MB |
| [`S2E`](S2E) | [S2E](https://arxiv.org/abs/2507.22028) · ICLR 2026 | point-goal / goal-free | 11 | 256×256 | 10 | `s2e.onnx` · 382 MB |
| [`MIMIC`](MIMIC) | [MIMIC](https://arxiv.org/abs/2603.22527) · ICRA 2026 | goal-free | 16 | 288×512 | 13 | `mimic.onnx` · 318 MB |

Suffix legend: `PG` = point-goal, `GL` = goal-less (goal-free). Models with a `.onnx.data` companion
(GNM, ViNT, NoMaD) use ONNX external weights — keep each `.onnx` and its `.onnx.data` together.

## Common interface

Each folder is a self-contained module exposing one navigator class. They all follow the same contract:

```python
import numpy as np
from MBRA_PG_Official.inference import MBRAPGNavigator   # run from the repo root

nav = MBRAPGNavigator(device="cuda")          # use device="cpu" if you have no GPU

# obs: (B, nav.context_size, 3, H, W) float32 in [0, 1]
#      the wrapper resizes & normalizes to the model's spec internally
obs = np.random.rand(1, nav.context_size, 3, 96, 96).astype(np.float32)

# Point-goal models take goal_xy (standard frame: x=forward, y=left, meters);
# goal-free models omit it.
traj, scores = nav.inference_trajectory(obs, goal_xy=np.array([5.0, 0.2]))  # (B, M, W, 2) meters
vw, best     = nav.inference_vw(obs,        goal_xy=np.array([5.0, 0.2]))   # vw: (B, 2) = [v, ω]

nav.reset()   # clears PD-controller velocity smoothing between episodes
```

Conventions shared by every model:

- **Coordinate frame** — all user-facing inputs/outputs are *standard frame*: `x = forward`, `y = left`, in meters. Models with a different internal convention (e.g. CityWalker) convert transparently.
- **Observations**`(B, context_size, 3, H, W)`, `float32`, pixel values in `[0, 1]`. The wrapper handles resize and any ImageNet normalization. *(Exception: `MIMIC` expects frames already at 288×512 and does not resize.)*
- **`inference_trajectory(obs[, goal_xy])`**`(trajectory, scores)`. `trajectory` is `(B, M, W, 2)` in meters, where `M` is the number of modes (1 for unimodal, 8 for NoMaD) and `W` the waypoint count; `scores` is `(B, M)`.
- **`inference_vw(obs[, goal_xy])`**`(vw, best_traj)` where `vw` is a `(B, 2)` torch tensor of `[linear_v, angular_w]`. Tune limits with `max_v` / `max_w` at construction.
- Goal-free models (`Vint`, `GNM`, `NoMaD`, `MIMIC`) ignore `goal_xy` — call `inference_trajectory(obs)`.

## Installation

```bash
pip install onnxruntime-gpu numpy torch torchvision pyyaml pillow
# CPU-only: use onnxruntime instead of onnxruntime-gpu
pip install opencv-python   # required by S2E (frame resizing)
```

Optional, lab-internal dependency: `Vint`, `GNM`, and `NoMaD` expose an extra `inference_vw_pp()`
method that uses `urbansim.custom.pp.PurePursuitController`; it is imported lazily and only needed
for that method. **`MIMIC` imports `urbansim` at module load**, so its `inference.py` will not import
without the `urbansim` package on your path.

## Model details

### GNM_GL_Official — `gnm_imagegoal.onnx` (+ `.onnx.data`)
**Paper:** *GNM: A General Navigation Model to Drive Any Robot* (ICRA 2023) · [arXiv:2210.03370](https://arxiv.org/abs/2210.03370) · [code](https://github.com/robodhruv/drive-any-robot)

Goal-free General Navigation Model — same NavDP image-goal I/O contract as ViNT (`obs_img (B,18,64,85)` + `goal_img (B,3,64,85)` → `dist_pred (B,1)`, `action_pred (B,5,4)`), with a lower top speed. Expects input downsampled to ≈ 3 Hz.

### Vint_GL_Official — `vint_imagegoal.onnx` (+ `.onnx.data`)
**Paper:** *ViNT: A Foundation Model for Visual Navigation* (CoRL 2023) · [arXiv:2306.14846](https://arxiv.org/abs/2306.14846) · [project](https://general-navigation-models.github.io/vint/)

Goal-free ViNT (NavDP image-goal backbone run with a random goal image). **ONNX I/O:** `obs_img (B,18,64,85)` (6 ImageNet-normalized frames × 3 ch) + `goal_img (B,3,64,85)` (random noise) → `dist_pred (B,1)`, `action_pred (B,5,4)`. Cumulative `xy` is already baked in; the wrapper scales by the 0.8 m metric spacing. Expects input downsampled to ≈ 3 Hz.

### NoMaD_GL_Official — 3× ONNX (diffusion, + `.onnx.data`)
**Paper:** *NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration* (ICRA 2024) · [arXiv:2310.07896](https://arxiv.org/abs/2310.07896) · [project](https://general-navigation-models.github.io/nomad/)

Goal-free diffusion policy. Runs a 10-step DDPM loop (`squaredcos_cap_v2`) over 3 components:
`nomad_vision_encoder.onnx` (`obs_img (B,12,96,96)` + `goal_img (B,3,96,96)` + `goal_mask (B)``cond (B,256)`), `nomad_noise_pred.onnx` (one denoising step), and `nomad_dist_pred.onnx`. Produces **8 trajectory samples**`trajectory (B,8,8,2)` meters (decode: unnormalize → cumsum → ×0.267 m spacing). This is the only multi-modal model and the slowest (diffusion + multiple samples).

### CityWalker_PG_Official — `citywalker.onnx`
**Paper:** *CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos* (CVPR 2025) · [arXiv:2411.17820](https://arxiv.org/abs/2411.17820) · [project](https://ai4ce.github.io/CityWalker/)

Point-goal urban walker. **ONNX I/O:** `obs_images (B,5,3,350,630)` + `trajectory (B,6,2)` past waypoints → `wp_pred (B,5,2)`, `arrive_pred (B,1)` (arrival probability). Images are ImageNet-normalized internally; the model's internal `y=forward, x=right` frame is converted to standard frame by the wrapper. Input rate ≈ 5 Hz.

### MBRA_PG_Official — `mbra.onnx`
**Paper:** *Learning to Drive Anywhere with Model-Based Reannotation* (RA-L 2025) · [arXiv:2505.05592](https://arxiv.org/abs/2505.05592) · [project](https://model-base-reannotation.github.io/)

Point-goal policy. **ONNX I/O:** `obs_images (B,6,3,96,96)` ImageNet-normalized + `goal_pose (B,4)` = `[x, y, sin(yaw), cos(yaw)]``waypoints (B,8,4)`. Goal is given as `goal_xy` (meters) and converted internally; waypoints are un-normalized by a 0.8 m metric spacing. Input rate ≈ 5 Hz.

### S2E — `s2e.onnx`
**Paper:** *From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning* (ICLR 2026) · [arXiv:2507.22028](https://arxiv.org/abs/2507.22028) · [project](https://metadriverse.github.io/s2e)

UCLA-VAIL navigation foundation model; this is the behavior-cloning, point-goal, web-pretrained variant (`S2EBC-PG-Web100`). **ONNX I/O:** `obs_images (B,11,3,256,256)` in `[0,1]` (no ImageNet norm) + `goal (B,3)` = `[norm_dist, cos(θ), sin(θ)]``wp_pred (B,10,3)` `[x,y,yaw]`, `wp_pred_score (B,63)` mode scores. Frames are resized to 256×256 with OpenCV.

### MIMIC — `mimic.onnx`
**Paper:** *Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion* (ICRA 2026) · [arXiv:2603.22527](https://arxiv.org/abs/2603.22527) · [project](https://vail-ucla.github.io/MIMIC)

UCLA-VAIL goal-free long-context sidewalk policy. **ONNX I/O:** `input (1,16,3,288,512)` in `[0,1]``output (1,15,3)` `[x,y,yaw]` at non-uniform timestamps (0.2 s–5.0 s @ 5 Hz). Batch is processed one sample at a time; the wrapper keeps the first 13 waypoints (~4 s) and scales to meters. Requires `urbansim` (see Installation).

## Downloading

**Full repo** (includes the LFS-tracked ONNX weights):
```bash
hf download UCLA-VAIL/Navigation-Model-Zoo-Public --local-dir ./Navigation-Model-Zoo-Public
```

**One model** — fetch just its folder, e.g. MBRA:
```bash
hf download UCLA-VAIL/Navigation-Model-Zoo-Public \
  --include "MBRA_PG_Official/*" --local-dir .
```

Then run from the repo root: `from MBRA_PG_Official.inference import MBRAPGNavigator`.

> **External weights:** GNM, ViNT, and NoMaD ship `*.onnx.data` files — keep each `.onnx` and its
> `.onnx.data` together in the same folder so ONNX Runtime can resolve the weights.

## Intended use & limitations

These are **research artifacts** for navigation research, reproduction, and benchmarking — not
safety-validated for deployment on real robots without additional testing. Each policy's behavior
is bounded by its training distribution (camera intrinsics, embodiment, frame rate, environment).
Several wrappers rectify/resize inputs to a specific training camera; mismatched cameras may degrade
performance.

## License

Released under **Apache 2.0**. Individual models carry the licenses and terms of their original
sources (ViNT, GNM, NoMaD, CityWalker, MBRA) — check upstream before commercial use.

## Citation

If you use a model from this zoo, please cite its original paper. 

**GNM**
```bibtex
@inproceedings{shah2023gnm,
  title={Gnm: A general navigation model to drive any robot},
  author={Shah, Dhruv and Sridhar, Ajay and Bhorkar, Arjun and Hirose, Noriaki and Levine, Sergey},
  booktitle={2023 IEEE International Conference on Robotics and Automation (ICRA)},
  pages={7226--7233},
  year={2023},
  organization={IEEE}
}
```

**ViNT**
```bibtex
@article{shah2023vint,
  title={ViNT: A foundation model for visual navigation},
  author={Shah, Dhruv and Sridhar, Ajay and Dashora, Nitish and Stachowicz, Kyle and Black, Kevin and Hirose, Noriaki and Levine, Sergey},
  journal={arXiv preprint arXiv:2306.14846},
  year={2023}
}
```

**NoMaD**
```bibtex
@inproceedings{sridhar2024nomad,
  title={Nomad: Goal masked diffusion policies for navigation and exploration},
  author={Sridhar, Ajay and Shah, Dhruv and Glossop, Catherine and Levine, Sergey},
  booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
  pages={63--70},
  year={2024},
  organization={IEEE}
}
```

**CityWalker**
```bibtex
@inproceedings{liu2025citywalker,
  title={Citywalker: Learning embodied urban navigation from web-scale videos},
  author={Liu, Xinhao and Li, Jintong and Jiang, Yicheng and Sujay, Niranjan and Yang, Zhicheng and Zhang, Juexiao and Abanes, John and Zhang, Jing and Feng, Chen},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={6875--6885},
  year={2025}
}
```

**MBRA**
```bibtex
@article{hirose2025learning,
  title={Learning to drive anywhere with model-based reannotation},
  author={Hirose, Noriaki and Ignatova, Lydia and Stachowicz, Kyle and Glossop, Catherine and Levine, Sergey and Shah, Dhruv},
  journal={IEEE Robotics and Automation Letters},
  volume={11},
  number={2},
  pages={1242--1249},
  year={2025},
  publisher={IEEE}
}
```

**S2E**
```bibtex
@article{he2025seeing,
  title={From seeing to experiencing: Scaling navigation foundation models with reinforcement learning},
  author={He, Honglin and Ma, Yukai and Squicciarini, Brad  and Wu, Wayne and Zhou, Bolei},
  journal={arXiv preprint arXiv:2507.22028},
  year={2025}
}
```

**MIMIC**
```bibtex
@article{he2026learning,
  title={Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion},
  author={He, Honglin and Ma, Yukai and Squicciarini, Brad and Wu, Wayne and Zhou, Bolei},
  journal={arXiv preprint arXiv:2603.22527},
  year={2026}
}
```

## Contact

Maintained by [UCLA-VAIL](https://vail-ucla.github.io/). Open an issue/discussion on the
repository page for questions or contributions.