Add model card for Nav-R2
Browse filesThis PR adds a comprehensive model card for the Nav-R2 model, linking it to the paper [Nav-$R^2$ Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation](https://huggingface.co/papers/2512.02400).
It includes essential metadata such as the `pipeline_tag: robotics`, `library_name: transformers`, and `license: apache-2.0`. It also incorporates relevant optional metadata like `base_model`, `datasets`, and additional descriptive `tags` for improved discoverability.
The model card features key visuals directly from the GitHub repository and provides a concise summary of the paper's abstract, adhering to the guidelines.
Please review and merge this PR if everything looks good.
README.md
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: robotics
|
| 4 |
+
library_name: transformers
|
| 5 |
+
base_model: Qwen/Qwen2.5-VL-7B-Instruct
|
| 6 |
+
datasets:
|
| 7 |
+
- Chrono666/Nav-R2-OVON-CoT-Dataset
|
| 8 |
+
tags:
|
| 9 |
+
- robotics
|
| 10 |
+
- navigation
|
| 11 |
+
- object-goal-navigation
|
| 12 |
+
- vision-language-model
|
| 13 |
+
- qwen
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# Nav-$R^2$: Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation
|
| 17 |
+
|
| 18 |
+
This repository contains the official implementation of the paper [Nav-$R^2$: Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation](https://huggingface.co/papers/2512.02400).
|
| 19 |
+
|
| 20 |
+
Object-goal navigation in open-vocabulary settings requires agents to locate novel objects in unseen environments. Nav-$R^2$ proposes a framework that explicitly models target-environment and environment-action relationships through structured Chain-of-Thought (CoT) reasoning and a Similarity-Aware Memory. This approach enables state-of-the-art performance in localizing unseen objects efficiently while maintaining real-time inference.
|
| 21 |
+
|
| 22 |
+
For more details on the code, installation, training, and evaluation, please refer to the [GitHub repository](https://github.com/AMAP-EAI/Nav-R2).
|
| 23 |
+
|
| 24 |
+
## Overview
|
| 25 |
+
|
| 26 |
+
<p align="center">
|
| 27 |
+
<img src="https://github.com/AMAP-EAI/Nav-R2/raw/main/figs/title.png" width="100%">
|
| 28 |
+
</p>
|
| 29 |
+
<p align="center">
|
| 30 |
+
<img src="https://github.com/AMAP-EAI/Nav-R2/raw/main/figs/teaser.png" width="100%">
|
| 31 |
+
</p>
|
| 32 |
+
|
| 33 |
+
### Pipeline and Structure
|
| 34 |
+
<p align="center">
|
| 35 |
+
<img src="https://github.com/AMAP-EAI/Nav-R2/raw/main/figs/pipeline.png" width="100%">
|
| 36 |
+
</p>
|
| 37 |
+
|
| 38 |
+
### Results on OVON
|
| 39 |
+
Here shows the results on OVON dataset. Nav-R2 is trained via **ONLY SFT** receiving **ONLY RGB observations** from **ONLY first-person view**, and achieves the best SR on the val-unseen split.
|
| 40 |
+
<p align="center">
|
| 41 |
+
<img src="https://github.com/AMAP-EAI/Nav-R2/raw/main/figs/main-results.png" width="100%">
|
| 42 |
+
</p>
|
| 43 |
+
|
| 44 |
+
## Citation
|
| 45 |
+
If you find our work helpful or inspiring, please feel free to cite it.
|
| 46 |
+
|
| 47 |
+
```bibtex
|
| 48 |
+
@article{zhou2025navr2,
|
| 49 |
+
title={Nav-R2: Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation},
|
| 50 |
+
author={Authors names and affiliations will be added after review},
|
| 51 |
+
journal={arXiv preprint arXiv:2512.02400},
|
| 52 |
+
year={2025}
|
| 53 |
+
}
|
| 54 |
+
```
|