Add pipeline tag and library name to model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
4
 
5
  # VST-7B-RL
@@ -34,7 +36,7 @@ We introduce **Visual Spatial Tuning (VST)**, a comprehensive framework designed
34
  ✨ **VST-P**: 4.1M samples across 19 skills, spanning single images, multi-image scenarios, and videos—boosting spatial perception in VLMs.
35
  ✨ **VST-R**: 135K curated samples that teach models to reason in space, including step-by-step reasoning and rule-based data for reinforcement learning.
36
  ✨ **Progressive Training Pipeline**: Start with supervised fine-tuning to build foundational spatial knowledge, then reinforce spatial reasoning abilities via RL. VST achieves state-of-the-art results on spatial benchmarks (34.8% on MMSI-Bench, 61.2% on VSIBench) without compromising general capabilities.
37
- ✨ **Vision-Language-Action Models Enhanced**: The VST paradigm significantly strengthens spatial tuning, paving the way for more physically grounded AI.
38
 
39
 
40
 
@@ -149,4 +151,4 @@ If you find our work helpful, feel free to give us a cite.
149
  journal={arXiv preprint arXiv:2511.05491},
150
  year={2025}
151
  }
152
- ```
 
1
  ---
2
  license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: image-text-to-text
5
  ---
6
 
7
  # VST-7B-RL
 
36
  ✨ **VST-P**: 4.1M samples across 19 skills, spanning single images, multi-image scenarios, and videos—boosting spatial perception in VLMs.
37
  ✨ **VST-R**: 135K curated samples that teach models to reason in space, including step-by-step reasoning and rule-based data for reinforcement learning.
38
  ✨ **Progressive Training Pipeline**: Start with supervised fine-tuning to build foundational spatial knowledge, then reinforce spatial reasoning abilities via RL. VST achieves state-of-the-art results on spatial benchmarks (34.8% on MMSI-Bench, 61.2% on VSIBench) without compromising general capabilities.
39
+ ✨ **Vision-Language-Action Models Enhanced**: The VST paradigm significantly strengthens robotic learning, paving the way for more physically grounded AI.
40
 
41
 
42
 
 
151
  journal={arXiv preprint arXiv:2511.05491},
152
  year={2025}
153
  }
154
+ ```