Update README.md
Browse files
README.md
CHANGED
|
@@ -69,6 +69,8 @@ model-index:
|
|
| 69 |
|
| 70 |
[](https://huggingface.co/nkkbr/ViCA2)
|
| 71 |
|
|
|
|
|
|
|
| 72 |
## Overview
|
| 73 |
|
| 74 |
**ViCA-7B** is a vision-language model specifically fine-tuned for *visuospatial reasoning* in indoor video environments. Built upon the LLaVA-Video-7B-Qwen2 architecture, it is trained using our newly proposed **ViCA-322K dataset**, which emphasizes both structured spatial annotations and complex instruction-based reasoning tasks.
|
|
|
|
| 69 |
|
| 70 |
[](https://huggingface.co/nkkbr/ViCA2)
|
| 71 |
|
| 72 |
+
[](https://arxiv.org/abs/2505.12312)
|
| 73 |
+
|
| 74 |
## Overview
|
| 75 |
|
| 76 |
**ViCA-7B** is a vision-language model specifically fine-tuned for *visuospatial reasoning* in indoor video environments. Built upon the LLaVA-Video-7B-Qwen2 architecture, it is trained using our newly proposed **ViCA-322K dataset**, which emphasizes both structured spatial annotations and complex instruction-based reasoning tasks.
|