nkkbr
/

ViCA

Video-Text-to-Text

text-generation

vision-language

video understanding

spatial reasoning

visuospatial cognition

Model card Files Files and versions

nkkbr commited on May 7, 2025

Commit

edfde49

·

1 Parent(s): 738d65b

update readme

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -230,10 +230,10 @@ Increasing input from 64 to 128 frames doubles the number of visual tokens (13,4
 ## Potential Applications
 ViCA-7B supports a broad range of spatially grounded multimodal applications:
-- **Indoor navigation assistants**
-- **Robotics planning and spatial querying**
-- **Smart room arrangement and AR layout analysis**
-- **Scene understanding for embodied AI agents**
 ## Known Limitations

 ## Potential Applications
 ViCA-7B supports a broad range of spatially grounded multimodal applications:
+- Indoor navigation assistants
+- Robotics planning and spatial querying
+- Smart room arrangement and AR layout analysis
+- Scene understanding for embodied AI agents
 ## Known Limitations