--- license: mit pipeline_tag: robotics library_name: transformers --- # VLN-PE Benchmark Models This repository hosts models and results for the [Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities](https://huggingface.co/papers/2507.13019) benchmark. VLN-PE is a physically realistic Vision-and-Language Navigation (VLN) platform supporting humanoid, quadruped, and wheeled robots. It aims to bridge the gap between idealized assumptions and physical deployment challenges in VLN, systematically evaluating ego-centric VLN methods across different technical pipelines. * **Project Page**: https://crystalsixone.github.io/vln_pe.github.io/ * **Code Repository**: https://github.com/InternRobotics/InternNav ## Benchmark Results The following table presents the benchmark results for various models evaluated on the VLN-PE platform: **VLN-PE Benchmark**

Model	Dataset/Benchmark	Val Seen							Val Unseen							Download
Model	Dataset/Benchmark	TL	NE	FR	StR	OS	SR	SPL	TL	NE	FR	StR	OS	SR	SPL	Download
Zero-shot transfer evaluation from VLN-CE
Seq2Seq-Full	R2R VLN-PE	7.80	7.62	20.21	3.04	19.3	15.2	12.79	7.73	7.18	18.04	3.04	22.42	16.48	14.11	model
CMA-Full	R2R VLN-PE	6.62	7.37	20.06	3.95	18.54	16.11	14.61	6.58	7.09	17.07	3.79	20.86	16.93	15.24	model
Train on VLN-PE
Seq2Seq	R2R VLN-PE	10.61	7.53	27.36	4.26	32.67	19.75	14.68	10.85	7.88	26.8	5.57	28.13	15.14	10.77	model
CMA	R2R VLN-PE	11.13	7.59	23.71	3.19	34.94	21.58	16.1	11.16	7.98	22.64	3.27	33.11	19.15	14.05	model
RDP	R2R VLN-PE	13.26	6.76	27.51	1.82	38.6	25.08	17.07	12.7	6.72	24.57	3.11	36.9	25.24	17.73	model
Seq2Seq+	R2R VLN-PE	10.22	7.75	33.43	3.19	30.09	16.86	12.54	9.88	7.85	26.27	6.52	28.79	16.56	12.7	model
CMA+	R2R VLN-PE	8.86	7.14	23.56	3.5	36.17	25.84	21.75	8.79	7.26	21.75	3.27	31.4	22.12	18.65	model