Update README.md
Browse files
README.md
CHANGED
|
@@ -4,7 +4,7 @@ library_name: transformers
|
|
| 4 |
pipeline_tag: image-text-to-text
|
| 5 |
---
|
| 6 |
|
| 7 |
-
# VST-7B-RL
|
| 8 |
|
| 9 |
<p align="left">
|
| 10 |
<a href="https://yangr116.github.io/vst_project/">
|
|
@@ -27,8 +27,9 @@ pipeline_tag: image-text-to-text
|
|
| 27 |
</a>
|
| 28 |
</p>
|
| 29 |
|
|
|
|
| 30 |
|
| 31 |
-
|
| 32 |
|
| 33 |
|
| 34 |
## 💡 Key Highlights
|
|
|
|
| 4 |
pipeline_tag: image-text-to-text
|
| 5 |
---
|
| 6 |
|
| 7 |
+
# Visual Spatial Tuning: VST-7B-RL
|
| 8 |
|
| 9 |
<p align="left">
|
| 10 |
<a href="https://yangr116.github.io/vst_project/">
|
|
|
|
| 27 |
</a>
|
| 28 |
</p>
|
| 29 |
|
| 30 |
+
This model is described in the paper [Visual Spatial Tuning](https://huggingface.co/papers/2511.05491).
|
| 31 |
|
| 32 |
+
TL;DR: VST is a comprehensive framework designed to cultivate Vision-Language Models (VLMs) with human-like visuospatial abilities—from spatial perception to advanced reasoning.
|
| 33 |
|
| 34 |
|
| 35 |
## 💡 Key Highlights
|