--- license: apache-2.0 tags: - robotics - navigation - visual-navigation - embodied-ai - onnx pipeline_tag: robotics --- # Navigation Model Zoo A collection of vision-based navigation policies exported to **ONNX**, each wrapped in a small, uniform Python inference API. Maintained by **Honglin He @ UCLA-VAIL**. Every model takes a short history of RGB frames and predicts a local trajectory (and optionally a distance-to-goal / arrival signal); a built-in PD controller turns the trajectory into `(v, ω)` velocity commands. All models share the same wrapper interface so they can be swapped and benchmarked without per-model glue code. ## Models | Folder | Model / paper | Goal mode | Context | Input H×W | Waypoints | Weights | |--------|---------------|-----------|:-------:|:---------:|:---------:|---------| | [`GNM_GL_Official`](GNM_GL_Official) | [GNM](https://arxiv.org/abs/2210.03370) · ICRA 2023 | goal-free | 6 | 64×85 | 5 | `gnm_imagegoal.onnx` (+`.data`) · 35 MB | | [`Vint_GL_Official`](Vint_GL_Official) | [ViNT](https://arxiv.org/abs/2306.14846) · CoRL 2023 | goal-free | 6 | 64×85 | 5 | `vint_imagegoal.onnx` (+`.data`) · 97 MB | | [`NoMaD_GL_Official`](NoMaD_GL_Official) | [NoMaD](https://arxiv.org/abs/2310.07896) · ICRA 2024 | goal-free (diffusion) | 4 | 96×96 | 8 ×8 samples | 3× `.onnx` (+`.data`) · 111 MB | | [`CityWalker_PG_Official`](CityWalker_PG_Official) | [CityWalker](https://arxiv.org/abs/2411.17820) · CVPR 2025 | point-goal | 5 | 350×630 | 5 | `citywalker.onnx` · 806 MB | | [`MBRA_PG_Official`](MBRA_PG_Official) | [MBRA](https://arxiv.org/abs/2505.05592) · RA-L 2025 | point-goal | 6 | 96×96 | 8 | `mbra.onnx` · 254 MB | | [`S2E`](S2E) | [S2E](https://arxiv.org/abs/2507.22028) · ICLR 2026 | point-goal / goal-free | 11 | 256×256 | 10 | `s2e.onnx` · 382 MB | | [`MIMIC`](MIMIC) | [MIMIC](https://arxiv.org/abs/2603.22527) · ICRA 2026 | goal-free | 16 | 288×512 | 13 | `mimic.onnx` · 318 MB | Suffix legend: `PG` = point-goal, `GL` = goal-less (goal-free). Models with a `.onnx.data` companion (GNM, ViNT, NoMaD) use ONNX external weights — keep each `.onnx` and its `.onnx.data` together. ## Common interface Each folder is a self-contained module exposing one navigator class. They all follow the same contract: ```python import numpy as np from MBRA_PG_Official.inference import MBRAPGNavigator # run from the repo root nav = MBRAPGNavigator(device="cuda") # use device="cpu" if you have no GPU # obs: (B, nav.context_size, 3, H, W) float32 in [0, 1] # the wrapper resizes & normalizes to the model's spec internally obs = np.random.rand(1, nav.context_size, 3, 96, 96).astype(np.float32) # Point-goal models take goal_xy (standard frame: x=forward, y=left, meters); # goal-free models omit it. traj, scores = nav.inference_trajectory(obs, goal_xy=np.array([5.0, 0.2])) # (B, M, W, 2) meters vw, best = nav.inference_vw(obs, goal_xy=np.array([5.0, 0.2])) # vw: (B, 2) = [v, ω] nav.reset() # clears PD-controller velocity smoothing between episodes ``` Conventions shared by every model: - **Coordinate frame** — all user-facing inputs/outputs are *standard frame*: `x = forward`, `y = left`, in meters. Models with a different internal convention (e.g. CityWalker) convert transparently. - **Observations** — `(B, context_size, 3, H, W)`, `float32`, pixel values in `[0, 1]`. The wrapper handles resize and any ImageNet normalization. *(Exception: `MIMIC` expects frames already at 288×512 and does not resize.)* - **`inference_trajectory(obs[, goal_xy])`** → `(trajectory, scores)`. `trajectory` is `(B, M, W, 2)` in meters, where `M` is the number of modes (1 for unimodal, 8 for NoMaD) and `W` the waypoint count; `scores` is `(B, M)`. - **`inference_vw(obs[, goal_xy])`** → `(vw, best_traj)` where `vw` is a `(B, 2)` torch tensor of `[linear_v, angular_w]`. Tune limits with `max_v` / `max_w` at construction. - Goal-free models (`Vint`, `GNM`, `NoMaD`, `MIMIC`) ignore `goal_xy` — call `inference_trajectory(obs)`. ## Installation ```bash pip install onnxruntime-gpu numpy torch torchvision pyyaml pillow # CPU-only: use onnxruntime instead of onnxruntime-gpu pip install opencv-python # required by S2E (frame resizing) ``` Optional, lab-internal dependency: `Vint`, `GNM`, and `NoMaD` expose an extra `inference_vw_pp()` method that uses `urbansim.custom.pp.PurePursuitController`; it is imported lazily and only needed for that method. **`MIMIC` imports `urbansim` at module load**, so its `inference.py` will not import without the `urbansim` package on your path. ## Model details ### GNM_GL_Official — `gnm_imagegoal.onnx` (+ `.onnx.data`) **Paper:** *GNM: A General Navigation Model to Drive Any Robot* (ICRA 2023) · [arXiv:2210.03370](https://arxiv.org/abs/2210.03370) · [code](https://github.com/robodhruv/drive-any-robot) Goal-free General Navigation Model — same NavDP image-goal I/O contract as ViNT (`obs_img (B,18,64,85)` + `goal_img (B,3,64,85)` → `dist_pred (B,1)`, `action_pred (B,5,4)`), with a lower top speed. Expects input downsampled to ≈ 3 Hz. ### Vint_GL_Official — `vint_imagegoal.onnx` (+ `.onnx.data`) **Paper:** *ViNT: A Foundation Model for Visual Navigation* (CoRL 2023) · [arXiv:2306.14846](https://arxiv.org/abs/2306.14846) · [project](https://general-navigation-models.github.io/vint/) Goal-free ViNT (NavDP image-goal backbone run with a random goal image). **ONNX I/O:** `obs_img (B,18,64,85)` (6 ImageNet-normalized frames × 3 ch) + `goal_img (B,3,64,85)` (random noise) → `dist_pred (B,1)`, `action_pred (B,5,4)`. Cumulative `xy` is already baked in; the wrapper scales by the 0.8 m metric spacing. Expects input downsampled to ≈ 3 Hz. ### NoMaD_GL_Official — 3× ONNX (diffusion, + `.onnx.data`) **Paper:** *NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration* (ICRA 2024) · [arXiv:2310.07896](https://arxiv.org/abs/2310.07896) · [project](https://general-navigation-models.github.io/nomad/) Goal-free diffusion policy. Runs a 10-step DDPM loop (`squaredcos_cap_v2`) over 3 components: `nomad_vision_encoder.onnx` (`obs_img (B,12,96,96)` + `goal_img (B,3,96,96)` + `goal_mask (B)` → `cond (B,256)`), `nomad_noise_pred.onnx` (one denoising step), and `nomad_dist_pred.onnx`. Produces **8 trajectory samples** → `trajectory (B,8,8,2)` meters (decode: unnormalize → cumsum → ×0.267 m spacing). This is the only multi-modal model and the slowest (diffusion + multiple samples). ### CityWalker_PG_Official — `citywalker.onnx` **Paper:** *CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos* (CVPR 2025) · [arXiv:2411.17820](https://arxiv.org/abs/2411.17820) · [project](https://ai4ce.github.io/CityWalker/) Point-goal urban walker. **ONNX I/O:** `obs_images (B,5,3,350,630)` + `trajectory (B,6,2)` past waypoints → `wp_pred (B,5,2)`, `arrive_pred (B,1)` (arrival probability). Images are ImageNet-normalized internally; the model's internal `y=forward, x=right` frame is converted to standard frame by the wrapper. Input rate ≈ 5 Hz. ### MBRA_PG_Official — `mbra.onnx` **Paper:** *Learning to Drive Anywhere with Model-Based Reannotation* (RA-L 2025) · [arXiv:2505.05592](https://arxiv.org/abs/2505.05592) · [project](https://model-base-reannotation.github.io/) Point-goal policy. **ONNX I/O:** `obs_images (B,6,3,96,96)` ImageNet-normalized + `goal_pose (B,4)` = `[x, y, sin(yaw), cos(yaw)]` → `waypoints (B,8,4)`. Goal is given as `goal_xy` (meters) and converted internally; waypoints are un-normalized by a 0.8 m metric spacing. Input rate ≈ 5 Hz. ### S2E — `s2e.onnx` **Paper:** *From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning* (ICLR 2026) · [arXiv:2507.22028](https://arxiv.org/abs/2507.22028) · [project](https://metadriverse.github.io/s2e) UCLA-VAIL navigation foundation model; this is the behavior-cloning, point-goal, web-pretrained variant (`S2EBC-PG-Web100`). **ONNX I/O:** `obs_images (B,11,3,256,256)` in `[0,1]` (no ImageNet norm) + `goal (B,3)` = `[norm_dist, cos(θ), sin(θ)]` → `wp_pred (B,10,3)` `[x,y,yaw]`, `wp_pred_score (B,63)` mode scores. Frames are resized to 256×256 with OpenCV. ### MIMIC — `mimic.onnx` **Paper:** *Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion* (ICRA 2026) · [arXiv:2603.22527](https://arxiv.org/abs/2603.22527) · [project](https://vail-ucla.github.io/MIMIC) UCLA-VAIL goal-free long-context sidewalk policy. **ONNX I/O:** `input (1,16,3,288,512)` in `[0,1]` → `output (1,15,3)` `[x,y,yaw]` at non-uniform timestamps (0.2 s–5.0 s @ 5 Hz). Batch is processed one sample at a time; the wrapper keeps the first 13 waypoints (~4 s) and scales to meters. Requires `urbansim` (see Installation). ## Downloading **Full repo** (includes the LFS-tracked ONNX weights): ```bash hf download UCLA-VAIL/Navigation-Model-Zoo-Public --local-dir ./Navigation-Model-Zoo-Public ``` **One model** — fetch just its folder, e.g. MBRA: ```bash hf download UCLA-VAIL/Navigation-Model-Zoo-Public \ --include "MBRA_PG_Official/*" --local-dir . ``` Then run from the repo root: `from MBRA_PG_Official.inference import MBRAPGNavigator`. > **External weights:** GNM, ViNT, and NoMaD ship `*.onnx.data` files — keep each `.onnx` and its > `.onnx.data` together in the same folder so ONNX Runtime can resolve the weights. ## Intended use & limitations These are **research artifacts** for navigation research, reproduction, and benchmarking — not safety-validated for deployment on real robots without additional testing. Each policy's behavior is bounded by its training distribution (camera intrinsics, embodiment, frame rate, environment). Several wrappers rectify/resize inputs to a specific training camera; mismatched cameras may degrade performance. ## License Released under **Apache 2.0**. Individual models carry the licenses and terms of their original sources (ViNT, GNM, NoMaD, CityWalker, MBRA) — check upstream before commercial use. ## Citation If you use a model from this zoo, please cite its original paper. **GNM** ```bibtex @inproceedings{shah2023gnm, title={Gnm: A general navigation model to drive any robot}, author={Shah, Dhruv and Sridhar, Ajay and Bhorkar, Arjun and Hirose, Noriaki and Levine, Sergey}, booktitle={2023 IEEE International Conference on Robotics and Automation (ICRA)}, pages={7226--7233}, year={2023}, organization={IEEE} } ``` **ViNT** ```bibtex @article{shah2023vint, title={ViNT: A foundation model for visual navigation}, author={Shah, Dhruv and Sridhar, Ajay and Dashora, Nitish and Stachowicz, Kyle and Black, Kevin and Hirose, Noriaki and Levine, Sergey}, journal={arXiv preprint arXiv:2306.14846}, year={2023} } ``` **NoMaD** ```bibtex @inproceedings{sridhar2024nomad, title={Nomad: Goal masked diffusion policies for navigation and exploration}, author={Sridhar, Ajay and Shah, Dhruv and Glossop, Catherine and Levine, Sergey}, booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)}, pages={63--70}, year={2024}, organization={IEEE} } ``` **CityWalker** ```bibtex @inproceedings{liu2025citywalker, title={Citywalker: Learning embodied urban navigation from web-scale videos}, author={Liu, Xinhao and Li, Jintong and Jiang, Yicheng and Sujay, Niranjan and Yang, Zhicheng and Zhang, Juexiao and Abanes, John and Zhang, Jing and Feng, Chen}, booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference}, pages={6875--6885}, year={2025} } ``` **MBRA** ```bibtex @article{hirose2025learning, title={Learning to drive anywhere with model-based reannotation}, author={Hirose, Noriaki and Ignatova, Lydia and Stachowicz, Kyle and Glossop, Catherine and Levine, Sergey and Shah, Dhruv}, journal={IEEE Robotics and Automation Letters}, volume={11}, number={2}, pages={1242--1249}, year={2025}, publisher={IEEE} } ``` **S2E** ```bibtex @article{he2025seeing, title={From seeing to experiencing: Scaling navigation foundation models with reinforcement learning}, author={He, Honglin and Ma, Yukai and Squicciarini, Brad and Wu, Wayne and Zhou, Bolei}, journal={arXiv preprint arXiv:2507.22028}, year={2025} } ``` **MIMIC** ```bibtex @article{he2026learning, title={Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion}, author={He, Honglin and Ma, Yukai and Squicciarini, Brad and Wu, Wayne and Zhou, Bolei}, journal={arXiv preprint arXiv:2603.22527}, year={2026} } ``` ## Contact Maintained by [UCLA-VAIL](https://vail-ucla.github.io/). Open an issue/discussion on the repository page for questions or contributions.