nielsr HF Staff commited on
Commit
9195d3c
·
verified ·
1 Parent(s): 72ceef4

Add library_name to metadata and improve model card links

Browse files

Hi! I'm Niels from the community science team at Hugging Face.

This PR improves the model card by adding `library_name: transformers` to the YAML metadata. Based on the model configuration and the provided documentation, this model is compatible with the `transformers` library (version 4.57.0 or higher). Adding this metadata enables the "Use in Transformers" button and automated code snippets on the Hub.

The rest of the model card remains highly detailed, providing excellent documentation for the paper, architecture, and usage.

Files changed (1) hide show
  1. README.md +7 -33
README.md CHANGED
@@ -1,16 +1,17 @@
1
  ---
2
- license: apache-2.0
 
3
  language:
4
  - en
 
 
 
5
  tags:
6
  - autonomous-driving
7
  - vision-language-action
8
  - chain-of-thought
9
  - trajectory-prediction
10
  - VLA
11
- base_model:
12
- - Qwen/Qwen3-VL-4B-Instruct
13
- pipeline_tag: image-text-to-text
14
  ---
15
 
16
  # OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
@@ -90,15 +91,6 @@ Staged training is essential — ablation shows that skipping it collapses PDM-s
90
  | AR CoT+Answer | 2.99 | 8.54 | 3.51 |
91
  | **OneVL** | **2.62** | 7.53 | **3.26** |
92
 
93
- ### CoT Text Quality (NAVSIM)
94
-
95
- | Method | Meta Action Acc. ↑ | STS Score ↑ | LLM Judge ↑ | Latency (s) ↓ |
96
- |---|:---:|:---:|:---:|:---:|
97
- | AR CoT+Answer | 73.20 | 79.75 | 81.86 | 6.58 |
98
- | **OneVL** | 71.00 | 78.26 | 79.13 | **4.46** |
99
-
100
- OneVL's language auxiliary decoder recovers 97% of explicit CoT quality at answer-only inference speed.
101
-
102
  ---
103
 
104
  ## Usage
@@ -144,25 +136,7 @@ python infer_onevl.py \
144
  --c_thought_visual 4 --max_visual_tokens 2560
145
  ```
146
 
147
- ### Multi-GPU Inference
148
-
149
- ```bash
150
- export MODEL_PATH=/path/to/OneVL-checkpoint
151
- export TEST_SET_PATH=test_data/navsim_test.json
152
- export OUTPUT_PATH=output/navsim/navsim_results.json
153
- bash run_infer.sh
154
- ```
155
-
156
- Per-benchmark scripts are available in `scripts/`:
157
-
158
- ```bash
159
- bash scripts/infer_navsim.sh
160
- bash scripts/infer_ar1.sh
161
- bash scripts/infer_roadwork.sh
162
- bash scripts/infer_impromptu.sh
163
- ```
164
-
165
- For full documentation, evaluation scripts, and data format details, see the [GitHub repository](https://github.com/xiaomi-research/onevl).
166
 
167
  ---
168
 
@@ -195,4 +169,4 @@ For full documentation, evaluation scripts, and data format details, see the [Gi
195
 
196
  Released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
197
 
198
- Model weights are built on [Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) and the visual tokenizer is from [Emu3.5-VisionTokenizer](https://huggingface.co/BAAI/Emu3.5-VisionTokenizer); please refer to their respective licenses as well.
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen3-VL-4B-Instruct
4
  language:
5
  - en
6
+ license: apache-2.0
7
+ pipeline_tag: image-text-to-text
8
+ library_name: transformers
9
  tags:
10
  - autonomous-driving
11
  - vision-language-action
12
  - chain-of-thought
13
  - trajectory-prediction
14
  - VLA
 
 
 
15
  ---
16
 
17
  # OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
 
91
  | AR CoT+Answer | 2.99 | 8.54 | 3.51 |
92
  | **OneVL** | **2.62** | 7.53 | **3.26** |
93
 
 
 
 
 
 
 
 
 
 
94
  ---
95
 
96
  ## Usage
 
136
  --c_thought_visual 4 --max_visual_tokens 2560
137
  ```
138
 
139
+ For full documentation, evaluation scripts, and data format details, see the [official GitHub repository](https://github.com/xiaomi-research/onevl).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
 
141
  ---
142
 
 
169
 
170
  Released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
171
 
172
+ Model weights are built on [Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) and the visual tokenizer is from [Emu3.5-VisionTokenizer](https://huggingface.co/BAAI/Emu3.5-VisionTokenizer); please refer to their respective licenses as well.