--- pipeline_tag: robotics library_name: transformers license: mit --- This repository contains models for the **VLN-PE Benchmark**, as presented in the paper [Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities](https://huggingface.co/papers/2507.13019). VLN-PE introduces a physically realistic Vision-and-Language Navigation platform supporting humanoid, quadruped, and wheeled robots, and systematically evaluates several ego-centric VLN methods in physical robotic settings. For more details, visit the [project page](https://crystalsixone.github.io/vln_pe.github.io/) or the main [GitHub repository](https://github.com/InternRobotics/InternNav). ## VLN-PE Benchmark

Model	Dataset/Benchmark	Val Seen							Val Unseen							Download
Model	Dataset/Benchmark	TL	NE	FR	StR	OS	SR	SPL	TL	NE	FR	StR	OS	SR	SPL	Download
Zero-shot transfer evaluation from VLN-CE
Seq2Seq-Full	R2R VLN-PE	7.80	7.62	20.21	3.04	19.3	15.2	12.79	7.73	7.18	18.04	3.04	22.42	16.48	14.11	model
CMA-Full	R2R VLN-PE	6.62	7.37	20.06	3.95	18.54	16.11	14.61	6.58	7.09	17.07	3.79	20.86	16.93	15.24	model
Train on VLN-PE
Seq2Seq	R2R VLN-PE	10.61	7.53	27.36	4.26	32.67	19.75	14.68	10.85	7.88	26.8	5.57	28.13	15.14	10.77	model
CMA	R2R VLN-PE	11.13	7.59	23.71	3.19	34.94	21.58	16.1	11.16	7.98	22.64	3.27	33.11	19.15	14.05	model
RDP	R2R VLN-PE	13.26	6.76	27.51	1.82	38.6	25.08	17.07	12.7	6.72	24.57	3.11	36.9	25.24	17.73	model
Seq2Seq+	R2R VLN-PE	10.22	7.75	33.43	3.19	30.09	16.86	12.54	9.88	7.85	26.27	6.52	28.79	16.56	12.7	model
CMA+	R2R VLN-PE	8.86	7.14	23.56	3.5	36.17	25.84	21.75	8.79	7.26	21.75	3.27	31.4	22.12	18.65	model

## Citation If you find our work helpful, please cite: ```bibtex @inproceedings{vlnpe, title={Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities}, author={Wang, Liuyi and Xia, Xinyuan and Zhao, Hui and Wang, Hanqing and Wang, Tai and Chen, Yilun and Liu, Chengju and Chen, Qijun and Pang, Jiangmiao}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, year={2025} } @misc{internnav2025, title = {{InternNav: InternRobotics'} open platform for building generalized navigation foundation models}, author = {InternNav Contributors}, howpublished={\url{https://github.com/InternRobotics/InternNav}}, year = {2025} } ```