Update README.md
Browse files
README.md
CHANGED
|
@@ -13,6 +13,8 @@ tags:
|
|
| 13 |
- vision-understanding
|
| 14 |
---
|
| 15 |
|
|
|
|
|
|
|
| 16 |
# **Spatial-VU**
|
| 17 |
|
| 18 |
> The **Spatial-VU** model is a fine-tuned version of **Qwen2.5-VL-7B-Instruct**, tailored for **Spatial Reasoning and Vision Understanding**. This variant is designed to generate highly detailed and descriptive captions across a broad range of visual categories, including images with complex, sensitive, or nuanced content—across varying aspect ratios and resolutions.
|
|
@@ -93,4 +95,4 @@ print(output_text)
|
|
| 93 |
* May produce explicit, sensitive, or offensive descriptions depending on image content and prompts.
|
| 94 |
* Not suitable for deployment in production systems requiring content filtering or moderation.
|
| 95 |
* Can exhibit variability in caption tone or style depending on input prompt phrasing.
|
| 96 |
-
* Accuracy for unfamiliar or synthetic visual styles may vary.
|
|
|
|
| 13 |
- vision-understanding
|
| 14 |
---
|
| 15 |
|
| 16 |
+

|
| 17 |
+
|
| 18 |
# **Spatial-VU**
|
| 19 |
|
| 20 |
> The **Spatial-VU** model is a fine-tuned version of **Qwen2.5-VL-7B-Instruct**, tailored for **Spatial Reasoning and Vision Understanding**. This variant is designed to generate highly detailed and descriptive captions across a broad range of visual categories, including images with complex, sensitive, or nuanced content—across varying aspect ratios and resolutions.
|
|
|
|
| 95 |
* May produce explicit, sensitive, or offensive descriptions depending on image content and prompts.
|
| 96 |
* Not suitable for deployment in production systems requiring content filtering or moderation.
|
| 97 |
* Can exhibit variability in caption tone or style depending on input prompt phrasing.
|
| 98 |
+
* Accuracy for unfamiliar or synthetic visual styles may vary.
|