nkkbr commited on
Commit
a0c9fc7
·
verified ·
1 Parent(s): e119ac2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -69,6 +69,8 @@ model-index:
69
 
70
  [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-ViCA2-blue)](https://huggingface.co/nkkbr/ViCA2)
71
 
 
 
72
  ## Overview
73
 
74
  **ViCA-7B** is a vision-language model specifically fine-tuned for *visuospatial reasoning* in indoor video environments. Built upon the LLaVA-Video-7B-Qwen2 architecture, it is trained using our newly proposed **ViCA-322K dataset**, which emphasizes both structured spatial annotations and complex instruction-based reasoning tasks.
 
69
 
70
  [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-ViCA2-blue)](https://huggingface.co/nkkbr/ViCA2)
71
 
72
+ [![arXiv](https://img.shields.io/badge/arXiv-2505.12312-B31B1B?logo=arxiv&link=https://arxiv.org/abs/2505.12312)](https://arxiv.org/abs/2505.12312)
73
+
74
  ## Overview
75
 
76
  **ViCA-7B** is a vision-language model specifically fine-tuned for *visuospatial reasoning* in indoor video environments. Built upon the LLaVA-Video-7B-Qwen2 architecture, it is trained using our newly proposed **ViCA-322K dataset**, which emphasizes both structured spatial annotations and complex instruction-based reasoning tasks.