license: apache-2.0
tags:
- robotics
- navigation
- visual-navigation
- embodied-ai
- onnx
pipeline_tag: robotics
Navigation Model Zoo
A collection of vision-based navigation policies exported to ONNX, each wrapped in a small, uniform Python inference API. Maintained by Honglin He @ UCLA-VAIL.
Every model takes a short history of RGB frames and predicts a local trajectory (and optionally a
distance-to-goal / arrival signal); a built-in PD controller turns the trajectory into (v, ω)
velocity commands. All models share the same wrapper interface so they can be swapped and
benchmarked without per-model glue code.
Models
| Folder | Model / paper | Goal mode | Context | Input H×W | Waypoints | Weights |
|---|---|---|---|---|---|---|
GNM_GL_Official |
GNM · ICRA 2023 | goal-free | 6 | 64×85 | 5 | gnm_imagegoal.onnx (+.data) · 35 MB |
Vint_GL_Official |
ViNT · CoRL 2023 | goal-free | 6 | 64×85 | 5 | vint_imagegoal.onnx (+.data) · 97 MB |
NoMaD_GL_Official |
NoMaD · ICRA 2024 | goal-free (diffusion) | 4 | 96×96 | 8 ×8 samples | 3× .onnx (+.data) · 111 MB |
CityWalker_PG_Official |
CityWalker · CVPR 2025 | point-goal | 5 | 350×630 | 5 | citywalker.onnx · 806 MB |
MBRA_PG_Official |
MBRA · RA-L 2025 | point-goal | 6 | 96×96 | 8 | mbra.onnx · 254 MB |
S2E |
S2E · ICLR 2026 | point-goal / goal-free | 11 | 256×256 | 10 | s2e.onnx · 382 MB |
MIMIC |
MIMIC · ICRA 2026 | goal-free | 16 | 288×512 | 13 | mimic.onnx · 318 MB |
Suffix legend: PG = point-goal, GL = goal-less (goal-free). Models with a .onnx.data companion
(GNM, ViNT, NoMaD) use ONNX external weights — keep each .onnx and its .onnx.data together.
Common interface
Each folder is a self-contained module exposing one navigator class. They all follow the same contract:
import numpy as np
from MBRA_PG_Official.inference import MBRAPGNavigator # run from the repo root
nav = MBRAPGNavigator(device="cuda") # use device="cpu" if you have no GPU
# obs: (B, nav.context_size, 3, H, W) float32 in [0, 1]
# the wrapper resizes & normalizes to the model's spec internally
obs = np.random.rand(1, nav.context_size, 3, 96, 96).astype(np.float32)
# Point-goal models take goal_xy (standard frame: x=forward, y=left, meters);
# goal-free models omit it.
traj, scores = nav.inference_trajectory(obs, goal_xy=np.array([5.0, 0.2])) # (B, M, W, 2) meters
vw, best = nav.inference_vw(obs, goal_xy=np.array([5.0, 0.2])) # vw: (B, 2) = [v, ω]
nav.reset() # clears PD-controller velocity smoothing between episodes
Conventions shared by every model:
- Coordinate frame — all user-facing inputs/outputs are standard frame:
x = forward,y = left, in meters. Models with a different internal convention (e.g. CityWalker) convert transparently. - Observations —
(B, context_size, 3, H, W),float32, pixel values in[0, 1]. The wrapper handles resize and any ImageNet normalization. (Exception:MIMICexpects frames already at 288×512 and does not resize.) inference_trajectory(obs[, goal_xy])→(trajectory, scores).trajectoryis(B, M, W, 2)in meters, whereMis the number of modes (1 for unimodal, 8 for NoMaD) andWthe waypoint count;scoresis(B, M).inference_vw(obs[, goal_xy])→(vw, best_traj)wherevwis a(B, 2)torch tensor of[linear_v, angular_w]. Tune limits withmax_v/max_wat construction.- Goal-free models (
Vint,GNM,NoMaD,MIMIC) ignoregoal_xy— callinference_trajectory(obs).
Installation
pip install onnxruntime-gpu numpy torch torchvision pyyaml pillow
# CPU-only: use onnxruntime instead of onnxruntime-gpu
pip install opencv-python # required by S2E (frame resizing)
Optional, lab-internal dependency: Vint, GNM, and NoMaD expose an extra inference_vw_pp()
method that uses urbansim.custom.pp.PurePursuitController; it is imported lazily and only needed
for that method. MIMIC imports urbansim at module load, so its inference.py will not import
without the urbansim package on your path.
Model details
GNM_GL_Official — gnm_imagegoal.onnx (+ .onnx.data)
Paper: GNM: A General Navigation Model to Drive Any Robot (ICRA 2023) · arXiv:2210.03370 · code
Goal-free General Navigation Model — same NavDP image-goal I/O contract as ViNT (obs_img (B,18,64,85) + goal_img (B,3,64,85) → dist_pred (B,1), action_pred (B,5,4)), with a lower top speed. Expects input downsampled to ≈ 3 Hz.
Vint_GL_Official — vint_imagegoal.onnx (+ .onnx.data)
Paper: ViNT: A Foundation Model for Visual Navigation (CoRL 2023) · arXiv:2306.14846 · project
Goal-free ViNT (NavDP image-goal backbone run with a random goal image). ONNX I/O: obs_img (B,18,64,85) (6 ImageNet-normalized frames × 3 ch) + goal_img (B,3,64,85) (random noise) → dist_pred (B,1), action_pred (B,5,4). Cumulative xy is already baked in; the wrapper scales by the 0.8 m metric spacing. Expects input downsampled to ≈ 3 Hz.
NoMaD_GL_Official — 3× ONNX (diffusion, + .onnx.data)
Paper: NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration (ICRA 2024) · arXiv:2310.07896 · project
Goal-free diffusion policy. Runs a 10-step DDPM loop (squaredcos_cap_v2) over 3 components:
nomad_vision_encoder.onnx (obs_img (B,12,96,96) + goal_img (B,3,96,96) + goal_mask (B) → cond (B,256)), nomad_noise_pred.onnx (one denoising step), and nomad_dist_pred.onnx. Produces 8 trajectory samples → trajectory (B,8,8,2) meters (decode: unnormalize → cumsum → ×0.267 m spacing). This is the only multi-modal model and the slowest (diffusion + multiple samples).
CityWalker_PG_Official — citywalker.onnx
Paper: CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos (CVPR 2025) · arXiv:2411.17820 · project
Point-goal urban walker. ONNX I/O: obs_images (B,5,3,350,630) + trajectory (B,6,2) past waypoints → wp_pred (B,5,2), arrive_pred (B,1) (arrival probability). Images are ImageNet-normalized internally; the model's internal y=forward, x=right frame is converted to standard frame by the wrapper. Input rate ≈ 5 Hz.
MBRA_PG_Official — mbra.onnx
Paper: Learning to Drive Anywhere with Model-Based Reannotation (RA-L 2025) · arXiv:2505.05592 · project
Point-goal policy. ONNX I/O: obs_images (B,6,3,96,96) ImageNet-normalized + goal_pose (B,4) = [x, y, sin(yaw), cos(yaw)] → waypoints (B,8,4). Goal is given as goal_xy (meters) and converted internally; waypoints are un-normalized by a 0.8 m metric spacing. Input rate ≈ 5 Hz.
S2E — s2e.onnx
Paper: From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning (ICLR 2026) · arXiv:2507.22028 · project
UCLA-VAIL navigation foundation model; this is the behavior-cloning, point-goal, web-pretrained variant (S2EBC-PG-Web100). ONNX I/O: obs_images (B,11,3,256,256) in [0,1] (no ImageNet norm) + goal (B,3) = [norm_dist, cos(θ), sin(θ)] → wp_pred (B,10,3) [x,y,yaw], wp_pred_score (B,63) mode scores. Frames are resized to 256×256 with OpenCV.
MIMIC — mimic.onnx
Paper: Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion (ICRA 2026) · arXiv:2603.22527 · project
UCLA-VAIL goal-free long-context sidewalk policy. ONNX I/O: input (1,16,3,288,512) in [0,1] → output (1,15,3) [x,y,yaw] at non-uniform timestamps (0.2 s–5.0 s @ 5 Hz). Batch is processed one sample at a time; the wrapper keeps the first 13 waypoints (~4 s) and scales to meters. Requires urbansim (see Installation).
Downloading
Full repo (includes the LFS-tracked ONNX weights):
hf download UCLA-VAIL/Navigation-Model-Zoo-Public --local-dir ./Navigation-Model-Zoo-Public
One model — fetch just its folder, e.g. MBRA:
hf download UCLA-VAIL/Navigation-Model-Zoo-Public \
--include "MBRA_PG_Official/*" --local-dir .
Then run from the repo root: from MBRA_PG_Official.inference import MBRAPGNavigator.
External weights: GNM, ViNT, and NoMaD ship
*.onnx.datafiles — keep each.onnxand its.onnx.datatogether in the same folder so ONNX Runtime can resolve the weights.
Intended use & limitations
These are research artifacts for navigation research, reproduction, and benchmarking — not safety-validated for deployment on real robots without additional testing. Each policy's behavior is bounded by its training distribution (camera intrinsics, embodiment, frame rate, environment). Several wrappers rectify/resize inputs to a specific training camera; mismatched cameras may degrade performance.
License
Released under Apache 2.0. Individual models carry the licenses and terms of their original sources (ViNT, GNM, NoMaD, CityWalker, MBRA) — check upstream before commercial use.
Citation
If you use a model from this zoo, please cite its original paper.
GNM
@inproceedings{shah2023gnm,
title={Gnm: A general navigation model to drive any robot},
author={Shah, Dhruv and Sridhar, Ajay and Bhorkar, Arjun and Hirose, Noriaki and Levine, Sergey},
booktitle={2023 IEEE International Conference on Robotics and Automation (ICRA)},
pages={7226--7233},
year={2023},
organization={IEEE}
}
ViNT
@article{shah2023vint,
title={ViNT: A foundation model for visual navigation},
author={Shah, Dhruv and Sridhar, Ajay and Dashora, Nitish and Stachowicz, Kyle and Black, Kevin and Hirose, Noriaki and Levine, Sergey},
journal={arXiv preprint arXiv:2306.14846},
year={2023}
}
NoMaD
@inproceedings{sridhar2024nomad,
title={Nomad: Goal masked diffusion policies for navigation and exploration},
author={Sridhar, Ajay and Shah, Dhruv and Glossop, Catherine and Levine, Sergey},
booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
pages={63--70},
year={2024},
organization={IEEE}
}
CityWalker
@inproceedings{liu2025citywalker,
title={Citywalker: Learning embodied urban navigation from web-scale videos},
author={Liu, Xinhao and Li, Jintong and Jiang, Yicheng and Sujay, Niranjan and Yang, Zhicheng and Zhang, Juexiao and Abanes, John and Zhang, Jing and Feng, Chen},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={6875--6885},
year={2025}
}
MBRA
@article{hirose2025learning,
title={Learning to drive anywhere with model-based reannotation},
author={Hirose, Noriaki and Ignatova, Lydia and Stachowicz, Kyle and Glossop, Catherine and Levine, Sergey and Shah, Dhruv},
journal={IEEE Robotics and Automation Letters},
volume={11},
number={2},
pages={1242--1249},
year={2025},
publisher={IEEE}
}
S2E
@article{he2025seeing,
title={From seeing to experiencing: Scaling navigation foundation models with reinforcement learning},
author={He, Honglin and Ma, Yukai and Squicciarini, Brad and Wu, Wayne and Zhou, Bolei},
journal={arXiv preprint arXiv:2507.22028},
year={2025}
}
MIMIC
@article{he2026learning,
title={Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion},
author={He, Honglin and Ma, Yukai and Squicciarini, Brad and Wu, Wayne and Zhou, Bolei},
journal={arXiv preprint arXiv:2603.22527},
year={2026}
}
Contact
Maintained by UCLA-VAIL. Open an issue/discussion on the repository page for questions or contributions.