Improve model card and add metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +18 -10
README.md CHANGED
@@ -1,6 +1,8 @@
1
  ---
2
  language:
3
  - en
 
 
4
  tags:
5
  - embodied-ai
6
  - aerial-vision-language-navigation
@@ -8,28 +10,34 @@ tags:
8
  - model-weights
9
  ---
10
 
11
- # WorldVLN Model Weights
12
 
13
- This repository contains the model weights introduced in the paper:
14
- [WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation].
15
 
16
- It includes the weights for the world model backbone and the action decoder.
17
 
18
- For more details about the model and its implementation, please refer to the GitHub repository:
19
- https://github.com/EmbodiedCity/WorldVLN.code
 
 
 
 
 
 
 
20
 
21
  ## Citation
22
 
23
- If this work has contributed to your research, welcome to cite it:
24
 
25
  ```bibtex
26
  @misc{zhao2026worldvln,
27
- title={WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation},
28
  author={Baining Zhao and Jiacheng Xu and Weicheng Feng and Xin Zhang and Zhaolu Wang and Haoyang Wang and Shilong Ji and Ziyou Wang and Jianjie Fang and Zhiheng Zheng and Weichen Zhang and Yu Shang and Wei Wu and Chen Gao and Xinlei Chen and Yong Li},
29
  year={2026},
30
  eprint={2605.15964},
31
  archivePrefix={arXiv},
32
  primaryClass={cs.RO},
33
- url={https://arxiv.org/abs/2605.15964},
34
  }
35
- ```
 
1
  ---
2
  language:
3
  - en
4
+ license: cc-by-4.0
5
+ pipeline_tag: robotics
6
  tags:
7
  - embodied-ai
8
  - aerial-vision-language-navigation
 
10
  - model-weights
11
  ---
12
 
13
+ # WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation
14
 
15
+ This repository contains the model weights for WorldVLN, the first autoregressive world action model for aerial vision-language navigation (VLN).
 
16
 
17
+ [**Paper**](https://huggingface.co/papers/2605.15964) | [**Project Page**](https://embodiedcity.github.io/WorldVLN/) | [**Code**](https://github.com/EmbodiedCity/WorldVLN.code)
18
 
19
+ WorldVLN formulates aerial navigation as a prediction-driven world-action problem. It adapts a latent autoregressive video backbone to predict short-horizon world-state transitions and decodes them directly into executable waypoint actions. After each action segment is executed, newly received observations are encoded back into the autoregressive context, enabling closed-loop world-action prediction.
20
+
21
+ ## Model Weights
22
+ This repository includes the weights for:
23
+ - The world model backbone.
24
+ - The action decoder.
25
+
26
+ ## Usage
27
+ For detailed instructions on installation, setup, and inference (including the autoregressive I/O protocol), please refer to the [official GitHub repository](https://github.com/EmbodiedCity/WorldVLN.code).
28
 
29
  ## Citation
30
 
31
+ If this work is useful for your research, please cite:
32
 
33
  ```bibtex
34
  @misc{zhao2026worldvln,
35
+ title={WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation},
36
  author={Baining Zhao and Jiacheng Xu and Weicheng Feng and Xin Zhang and Zhaolu Wang and Haoyang Wang and Shilong Ji and Ziyou Wang and Jianjie Fang and Zhiheng Zheng and Weichen Zhang and Yu Shang and Wei Wu and Chen Gao and Xinlei Chen and Yong Li},
37
  year={2026},
38
  eprint={2605.15964},
39
  archivePrefix={arXiv},
40
  primaryClass={cs.RO},
41
+ url={https://arxiv.org/abs/2605.15964},
42
  }
43
+ ```