Add model card and metadata for StereoNav

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +35 -0
README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: robotics
4
+ ---
5
+
6
+ # StereoNav: What Limits Vision-and-Language Navigation?
7
+
8
+ [**Paper**](https://huggingface.co/papers/2605.13328) | [**Project Page**](https://yunheng-wang.github.io/stereonav-public.github.io/) | [**GitHub**](https://github.com/Yunheng-Wang/StereoNav)
9
+
10
+ StereoNav is a robust Vision-Language-Action (VLA) framework designed to enhance the consistency of real-world robot navigation. It addresses the gaps between simulation and physical execution by introducing Target-Location Priors and leveraging stereo vision to construct a unified representation of semantics and geometry.
11
+
12
+ ## Overview
13
+
14
+ Current Vision-and-Language Navigation (VLN) agents often suffer from performance degradation in the real world due to perceptual instability (lighting variations, motion blur) and under-specified instructions. StereoNav addresses these challenges through:
15
+
16
+ - **Target-Location Priors:** A persistent bridge between synthetic training and physical execution that provides stable visual guidance.
17
+ - **Stereo Vision:** Construction of a unified representation of semantics and geometry for enhanced depth awareness and precise action prediction.
18
+ - **Efficiency:** Achieves state-of-the-art performance on benchmarks like R2R-CE and RxR-CE using significantly fewer parameters and less training data than prior scaling-based approaches.
19
+
20
+ ## Usage
21
+
22
+ StereoNav operates using a multi-component architecture involving an inference server, action server, and camera server. For detailed instructions on environment setup, training, and deployment on robotic platforms (such as the Unitree G1), please refer to the [official GitHub repository](https://github.com/Yunheng-Wang/StereoNav).
23
+
24
+ ## Citation
25
+
26
+ If you find this work useful, please consider citing:
27
+
28
+ ```bibtex
29
+ @article{stereonav2026,
30
+ title = {What Limits Vision-and-Language Navigation?},
31
+ author = {Yunheng Wang and Yuetong Fang and Taowen Wang and Lusong Li and Kun Liu and Junzhe Xu and Zizhao Yuan and Yixiao Feng and Jiaxi Zhang and Wei Lu and Zecui Zeng and Renjing Xu},
32
+ journal = {arXiv preprint},
33
+ year = {2026},
34
+ }
35
+ ```