anonymous-stereonav
/

StereoNav_Model

Safetensors

Model card Files Files and versions

xet

Community

Add model card and metadata for StereoNav

by nielsr HF Staff - opened May 14

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+35

-0

Files changed (1) hide show

README.md +35 -0

README.md ADDED Viewed

	@@ -0,0 +1,35 @@

+---
+license: mit
+pipeline_tag: robotics
+---
+# StereoNav: What Limits Vision-and-Language Navigation?
+[**Paper**](https://huggingface.co/papers/2605.13328) | [**Project Page**](https://yunheng-wang.github.io/stereonav-public.github.io/) | [**GitHub**](https://github.com/Yunheng-Wang/StereoNav)
+StereoNav is a robust Vision-Language-Action (VLA) framework designed to enhance the consistency of real-world robot navigation. It addresses the gaps between simulation and physical execution by introducing Target-Location Priors and leveraging stereo vision to construct a unified representation of semantics and geometry.
+## Overview
+Current Vision-and-Language Navigation (VLN) agents often suffer from performance degradation in the real world due to perceptual instability (lighting variations, motion blur) and under-specified instructions. StereoNav addresses these challenges through:
+- **Target-Location Priors:** A persistent bridge between synthetic training and physical execution that provides stable visual guidance.
+- **Stereo Vision:** Construction of a unified representation of semantics and geometry for enhanced depth awareness and precise action prediction.
+- **Efficiency:** Achieves state-of-the-art performance on benchmarks like R2R-CE and RxR-CE using significantly fewer parameters and less training data than prior scaling-based approaches.
+## Usage
+StereoNav operates using a multi-component architecture involving an inference server, action server, and camera server. For detailed instructions on environment setup, training, and deployment on robotic platforms (such as the Unitree G1), please refer to the [official GitHub repository](https://github.com/Yunheng-Wang/StereoNav).
+## Citation
+If you find this work useful, please consider citing:
+```bibtex
+@article{stereonav2026,
+  title     = {What Limits Vision-and-Language Navigation?},
+  author    = {Yunheng Wang and Yuetong Fang and Taowen Wang and Lusong Li and Kun Liu and Junzhe Xu and Zizhao Yuan and Yixiao Feng and Jiaxi Zhang and Wei Lu and Zecui Zeng and Renjing Xu},
+  journal   = {arXiv preprint},
+  year      = {2026},
+}
+```