license: apache-2.0
pipeline_tag: robotics
library_name: transformers
base_model: Qwen/Qwen2.5-VL-7B-Instruct
datasets:
- Chrono666/Nav-R2-OVON-CoT-Dataset
tags:
- robotics
- navigation
- object-goal-navigation
- vision-language-model
- qwen
Nav-$R^2$: Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation
This repository contains the official implementation of the paper Nav-$R^2$: Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation.
Object-goal navigation in open-vocabulary settings requires agents to locate novel objects in unseen environments. Nav-$R^2$ proposes a framework that explicitly models target-environment and environment-action relationships through structured Chain-of-Thought (CoT) reasoning and a Similarity-Aware Memory. This approach enables state-of-the-art performance in localizing unseen objects efficiently while maintaining real-time inference.
For more details on the code, installation, training, and evaluation, please refer to the GitHub repository.
Overview
Pipeline and Structure
Results on OVON
Here shows the results on OVON dataset. Nav-R2 is trained via ONLY SFT receiving ONLY RGB observations from ONLY first-person view, and achieves the best SR on the val-unseen split.
Citation
If you find our work helpful or inspiring, please feel free to cite it.
@article{zhou2025navr2,
title={Nav-R2: Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation},
author={Authors names and affiliations will be added after review},
journal={arXiv preprint arXiv:2512.02400},
year={2025}
}