Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
nkkbr
/
ViCA
like
0
Video-Text-to-Text
Transformers
Safetensors
nkkbr/ViCA-322K
nkkbr/ViCA-thinking-2.68k
English
llava
text-generation
multimodal
vision-language
video understanding
spatial reasoning
visuospatial cognition
qwen
llava-video
Eval Results
arxiv:
2505.12312
arxiv:
2505.12363
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
main
ViCA
Commit History
Update README.md
11d7af5
verified
nkkbr
commited on
11 days ago
Update README.md
a0c9fc7
verified
nkkbr
commited on
Nov 14
Update README.md
e119ac2
verified
nkkbr
commited on
May 28
Update README.md
8185a7d
verified
nkkbr
commited on
May 28
Update README.md
88a067a
verified
nkkbr
commited on
May 20
Update README.md
eeb99d3
verified
nkkbr
commited on
May 15
.
e2d3083
nkkbr
commited on
May 8
.
c050d43
nkkbr
commited on
May 8
update readme
5828264
nkkbr
commited on
May 7
update readme
edfde49
nkkbr
commited on
May 7
update readme
738d65b
nkkbr
commited on
May 7
update readme
1f5a21f
nkkbr
commited on
May 7
update readme
7bb49d3
nkkbr
commited on
May 7
upload raw evaluation outputs for analysis and reproducibility
2968096
nkkbr
commited on
May 7
update readme
ed35572
nkkbr
commited on
May 7
Add model card with base_model and tags
314ad38
nkkbr
commited on
May 3
Initial commit
bb2bb30
nkkbr
commited on
Apr 21
initial commit
c028774
verified
nkkbr
commited on
Apr 21