Update README.md
Browse files
README.md
CHANGED
|
@@ -63,12 +63,16 @@ model-index:
|
|
| 63 |
|
| 64 |
# ViCA-7B: Visuospatial Cognitive Assistant
|
| 65 |
|
|
|
|
|
|
|
| 66 |
> You may also be interested in our other project, **ViCA2**. Please refer to the following links:
|
| 67 |
|
| 68 |
[](https://github.com/nkkbr/ViCA)
|
| 69 |
|
| 70 |
[](https://huggingface.co/nkkbr/ViCA2)
|
| 71 |
|
|
|
|
|
|
|
| 72 |
## Overview
|
| 73 |
|
| 74 |
**ViCA-7B** is a vision-language model specifically fine-tuned for *visuospatial reasoning* in indoor video environments. Built upon the LLaVA-Video-7B-Qwen2 architecture, it is trained using our newly proposed **ViCA-322K dataset**, which emphasizes both structured spatial annotations and complex instruction-based reasoning tasks.
|
|
|
|
| 63 |
|
| 64 |
# ViCA-7B: Visuospatial Cognitive Assistant
|
| 65 |
|
| 66 |
+
[](https://arxiv.org/abs/2505.12312)
|
| 67 |
+
|
| 68 |
> You may also be interested in our other project, **ViCA2**. Please refer to the following links:
|
| 69 |
|
| 70 |
[](https://github.com/nkkbr/ViCA)
|
| 71 |
|
| 72 |
[](https://huggingface.co/nkkbr/ViCA2)
|
| 73 |
|
| 74 |
+
[](https://arxiv.org/abs/2505.12363)
|
| 75 |
+
|
| 76 |
## Overview
|
| 77 |
|
| 78 |
**ViCA-7B** is a vision-language model specifically fine-tuned for *visuospatial reasoning* in indoor video environments. Built upon the LLaVA-Video-7B-Qwen2 architecture, it is trained using our newly proposed **ViCA-322K dataset**, which emphasizes both structured spatial annotations and complex instruction-based reasoning tasks.
|