FastVLM Collection Efficient Vision Encoding for Vision Language Models • 8 items • Updated 11 days ago • 109
MobileCLIP2 Collection MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B • 27 items • Updated 11 days ago • 58
Nomic Embed Vision Collection Vision Encoders aligned to Nomic Embed Text making Nomic Embed multimodal! • 2 items • Updated Jun 5, 2024 • 10
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 147