Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
DAMO-NLP-SG
/
VL3-SigLIP-NaViT
like
10
Follow
Language Technology Lab at Alibaba DAMO Academy
156
Image Feature Extraction
Transformers
Safetensors
English
videollama3_vision_encoder
feature-extraction
visual-encoder
multi-modal-large-language-model
custom_code
arxiv:
2501.13106
arxiv:
2406.07476
arxiv:
2306.02858
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
7
Deploy
Use this model
Training details
#2
by
lucasjin
- opened
Jan 23, 2025
Discussion
lucasjin
Jan 23, 2025
any details on how does this model trained?
See translation
Edit
Preview
Upload images, audio, and videos by dragging in the text input, pasting, or
clicking here
.
Tap or paste here to upload images
Comment
·
Sign up
or
log in
to comment