rayruiyang
/

VST-7B-RL

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

rayruiyang commited on Nov 11, 2025

Commit

b18ec54

·

verified ·

1 Parent(s): d69143e

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ library_name: transformers
 pipeline_tag: image-text-to-text
 ---
-# VST-7B-RL
 <p align="left">
   <a href="https://yangr116.github.io/vst_project/">
@@ -27,8 +27,9 @@ pipeline_tag: image-text-to-text
   </a>
 </p>
-We introduce **Visual Spatial Tuning (VST)**, a comprehensive framework designed to cultivate Vision-Language Models (VLMs) with human-like visuospatial abilities—from spatial perception to advanced reasoning.
 ## 💡 Key Highlights

 pipeline_tag: image-text-to-text
 ---
+# Visual Spatial Tuning: VST-7B-RL
 <p align="left">
   <a href="https://yangr116.github.io/vst_project/">
   </a>
 </p>
+This model is described in the paper [Visual Spatial Tuning](https://huggingface.co/papers/2511.05491).
+TL;DR: VST is a comprehensive framework designed to cultivate Vision-Language Models (VLMs) with human-like visuospatial abilities—from spatial perception to advanced reasoning.
 ## 💡 Key Highlights