|
|
--- |
|
|
license: cc-by-nc-4.0 |
|
|
--- |
|
|
Pretrained Weights of [NaVid](https://pku-epic.github.io/NaVid/): Video-based VLM Plans the Next Step for Vision-and-Language Navigation (RSS 2024) |
|
|
|
|
|
The model is trained on samples collected from the training splits of [VLN-CE](https://github.com/jacobkrantz/VLN-CE) R2R and RxR. |
|
|
|
|
|
| Evaliation Benchmark | TL | NE | OS | SR | SPL | |
|
|
|----------------------|:----:|:----:|:----:|:----:|:----:| |
|
|
| VLN-CE R2R Val. | 10.7 | 5.65 | 49.2 | 41.9 | 36.5 | |
|
|
| [VLN-CE R2R Test](https://eval.ai/web/challenges/challenge-page/719/leaderboard/1966) | 11.3 | 5.39 | 52 | 45 | 39 | |
|
|
| VLN-CE RxR Val. | 15.4 | 5.72 | 55.6 | 45.7 | 38.2 | |
|
|
|
|
|
The related inference code can be found in [here](https://github.com/jzhzhang/NaVid-VLN-CE) |
|
|
|