nkkbr
/

ViCA

Video-Text-to-Text

text-generation

vision-language

video understanding

spatial reasoning

visuospatial cognition

Model card Files Files and versions

nkkbr commited on May 20, 2025

Commit

88a067a

·

verified ·

1 Parent(s): eeb99d3

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -63,12 +63,16 @@ model-index:
 # ViCA-7B: Visuospatial Cognitive Assistant
 > You may also be interested in our other project, **ViCA2**. Please refer to the following links:
 [![GitHub](https://img.shields.io/badge/GitHub-ViCA2-181717?logo=github&logoColor=white)](https://github.com/nkkbr/ViCA)
 [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-ViCA2-blue)](https://huggingface.co/nkkbr/ViCA2)
 ## Overview
 **ViCA-7B** is a vision-language model specifically fine-tuned for *visuospatial reasoning* in indoor video environments. Built upon the LLaVA-Video-7B-Qwen2 architecture, it is trained using our newly proposed **ViCA-322K dataset**, which emphasizes both structured spatial annotations and complex instruction-based reasoning tasks.

 # ViCA-7B: Visuospatial Cognitive Assistant
+[![arXiv](https://img.shields.io/badge/arXiv-2505.12312-B31B1B?logo=arxiv&link=https://arxiv.org/abs/2505.12312)](https://arxiv.org/abs/2505.12312)
 > You may also be interested in our other project, **ViCA2**. Please refer to the following links:
 [![GitHub](https://img.shields.io/badge/GitHub-ViCA2-181717?logo=github&logoColor=white)](https://github.com/nkkbr/ViCA)
 [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-ViCA2-blue)](https://huggingface.co/nkkbr/ViCA2)
+[![arXiv](https://img.shields.io/badge/arXiv-2505.12363-B31B1B?logo=arxiv&link=https://arxiv.org/abs/2505.12363)](https://arxiv.org/abs/2505.12363)
 ## Overview
 **ViCA-7B** is a vision-language model specifically fine-tuned for *visuospatial reasoning* in indoor video environments. Built upon the LLaVA-Video-7B-Qwen2 architecture, it is trained using our newly proposed **ViCA-322K dataset**, which emphasizes both structured spatial annotations and complex instruction-based reasoning tasks.