Improve model card and add metadata
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,6 +1,8 @@
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
- en
|
|
|
|
|
|
|
| 4 |
tags:
|
| 5 |
- embodied-ai
|
| 6 |
- aerial-vision-language-navigation
|
|
@@ -8,28 +10,34 @@ tags:
|
|
| 8 |
- model-weights
|
| 9 |
---
|
| 10 |
|
| 11 |
-
# WorldVLN Model
|
| 12 |
|
| 13 |
-
This repository contains the model weights
|
| 14 |
-
[WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation].
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
## Citation
|
| 22 |
|
| 23 |
-
If this work
|
| 24 |
|
| 25 |
```bibtex
|
| 26 |
@misc{zhao2026worldvln,
|
| 27 |
-
title={WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation},
|
| 28 |
author={Baining Zhao and Jiacheng Xu and Weicheng Feng and Xin Zhang and Zhaolu Wang and Haoyang Wang and Shilong Ji and Ziyou Wang and Jianjie Fang and Zhiheng Zheng and Weichen Zhang and Yu Shang and Wei Wu and Chen Gao and Xinlei Chen and Yong Li},
|
| 29 |
year={2026},
|
| 30 |
eprint={2605.15964},
|
| 31 |
archivePrefix={arXiv},
|
| 32 |
primaryClass={cs.RO},
|
| 33 |
-
url={https://arxiv.org/abs/2605.15964},
|
| 34 |
}
|
| 35 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
+
license: cc-by-4.0
|
| 5 |
+
pipeline_tag: robotics
|
| 6 |
tags:
|
| 7 |
- embodied-ai
|
| 8 |
- aerial-vision-language-navigation
|
|
|
|
| 10 |
- model-weights
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation
|
| 14 |
|
| 15 |
+
This repository contains the model weights for WorldVLN, the first autoregressive world action model for aerial vision-language navigation (VLN).
|
|
|
|
| 16 |
|
| 17 |
+
[**Paper**](https://huggingface.co/papers/2605.15964) | [**Project Page**](https://embodiedcity.github.io/WorldVLN/) | [**Code**](https://github.com/EmbodiedCity/WorldVLN.code)
|
| 18 |
|
| 19 |
+
WorldVLN formulates aerial navigation as a prediction-driven world-action problem. It adapts a latent autoregressive video backbone to predict short-horizon world-state transitions and decodes them directly into executable waypoint actions. After each action segment is executed, newly received observations are encoded back into the autoregressive context, enabling closed-loop world-action prediction.
|
| 20 |
+
|
| 21 |
+
## Model Weights
|
| 22 |
+
This repository includes the weights for:
|
| 23 |
+
- The world model backbone.
|
| 24 |
+
- The action decoder.
|
| 25 |
+
|
| 26 |
+
## Usage
|
| 27 |
+
For detailed instructions on installation, setup, and inference (including the autoregressive I/O protocol), please refer to the [official GitHub repository](https://github.com/EmbodiedCity/WorldVLN.code).
|
| 28 |
|
| 29 |
## Citation
|
| 30 |
|
| 31 |
+
If this work is useful for your research, please cite:
|
| 32 |
|
| 33 |
```bibtex
|
| 34 |
@misc{zhao2026worldvln,
|
| 35 |
+
title={WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation},
|
| 36 |
author={Baining Zhao and Jiacheng Xu and Weicheng Feng and Xin Zhang and Zhaolu Wang and Haoyang Wang and Shilong Ji and Ziyou Wang and Jianjie Fang and Zhiheng Zheng and Weichen Zhang and Yu Shang and Wei Wu and Chen Gao and Xinlei Chen and Yong Li},
|
| 37 |
year={2026},
|
| 38 |
eprint={2605.15964},
|
| 39 |
archivePrefix={arXiv},
|
| 40 |
primaryClass={cs.RO},
|
| 41 |
+
url={https://arxiv.org/abs/2605.15964},
|
| 42 |
}
|
| 43 |
+
```
|