LearningMae
/

vnm_zeroshot_eval

zero-shot evaluation

foundation models

visual navigation

real-world evaluation

Model card Files Files and versions

vnm_zeroshot_eval / README.md

MaevaGuerrier

fix pipeline tag

4c8fa58 6 days ago

|

History Blame Contribute Delete

3.58 kB

	---
	license: mit
	tags:
	- zero-shot evaluation
	- foundation models
	- visual navigation
	- robot learning
	- real-world evaluation
	- onnx
	pipeline_tag: robotics
	library_name: onnxruntime
	arxiv: 2603.25937
	base_model:
	- rail-berkeley/crossformer
	- robodhruv/visualnav-transformer
	- hren20/NaiviBridger
	---


	# Can Vision Foundation Models Navigate? Zero-Shot Real-World Evaluation and Lessons Learned — ONNX Models

	ONNX-optimized exports of visual navigation models for deployment on physical robots (e.g., Boston Dynamic Spot, AgileX Limo, AgileX Bunker). These exports are derived from the original works listed below — all credit for architectures and training goes to the respective authors.

	See https://github.com/MaevaGuerrier/vnm-zeroshot-eval for deployment instructions.

	# Acknowledgements

	We would like to thank the authors of the following works, whose open-source models made this evaluation possible.
	- [GNM](https://arxiv.org/abs/2210.03370)
	- [ViNT](https://arxiv.org/abs/2306.14846)
	- [NoMaD](https://arxiv.org/abs/2310.07896)
	- [NaviBridger](https://arxiv.org/abs/2504.10041)
	- [CrossFormer](https://arxiv.org/abs/2408.11812)

	# Citations

	If you use this work, please cite:

	```bibtex
	@article{guerrier2026vnm,
	title = {Can Vision Foundation Models Navigate? Zero-Shot Real-World Evaluation and Lessons Learned},
	author = {Guerrier, Maeva and Soma, Karthik and Pavlasek, Jana and Beltrame, Giovanni},
	journal = {arXiv preprint arXiv:2603.25937},
	year = {2026}
	}
	```

	Consider citing the original models as well:

	```bibtex
	@misc{shah2023gnmgeneralnavigationmodel,
	title={GNM: A General Navigation Model to Drive Any Robot},
	author={Dhruv Shah and Ajay Sridhar and Arjun Bhorkar and Noriaki Hirose and Sergey Levine},
	year={2023},
	eprint={2210.03370},
	archivePrefix={arXiv},
	primaryClass={cs.RO},
	url={https://arxiv.org/abs/2210.03370},
	}
	```


	```bibtex
	@misc{shah2023vintfoundationmodelvisual,
	title={ViNT: A Foundation Model for Visual Navigation},
	author={Dhruv Shah and Ajay Sridhar and Nitish Dashora and Kyle Stachowicz and Kevin Black and Noriaki Hirose and Sergey Levine},
	year={2023},
	eprint={2306.14846},
	archivePrefix={arXiv},
	primaryClass={cs.RO},
	url={https://arxiv.org/abs/2306.14846},
	}
	```


	```bibtex
	@misc{sridhar2023nomadgoalmaskeddiffusion,
	title={NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration},
	author={Ajay Sridhar and Dhruv Shah and Catherine Glossop and Sergey Levine},
	year={2023},
	eprint={2310.07896},
	archivePrefix={arXiv},
	primaryClass={cs.RO},
	url={https://arxiv.org/abs/2310.07896},
	}
	```


	```bibtex
	@misc{ren2025priordoesmattervisual,
	title={Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models},
	author={Hao Ren and Yiming Zeng and Zetong Bi and Zhaoliang Wan and Junlong Huang and Hui Cheng},
	year={2025},
	eprint={2504.10041},
	archivePrefix={arXiv},
	primaryClass={cs.RO},
	url={https://arxiv.org/abs/2504.10041},
	}
	```


	```bibtex
	@misc{doshi2024scalingcrossembodiedlearningpolicy,
	title={Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation},
	author={Ria Doshi and Homer Walke and Oier Mees and Sudeep Dasari and Sergey Levine},
	year={2024},
	eprint={2408.11812},
	archivePrefix={arXiv},
	primaryClass={cs.RO},
	url={https://arxiv.org/abs/2408.11812},
	}
	```