jina-embeddings-v5-omni Collection Multimodal (text + image + video + audio) embedding models aligned with jina-embeddings-v5-text-*. Two sizes, four task variants each. • 27 items • Updated 4 days ago • 31
GenLIP Collection Model weights of paper "Let ViT Speak: Generative Language-Image Pre-training" • 6 items • Updated 11 days ago • 5
TIPSv2 Collection TIPSv2 foundational vision-language models. Webpage: https://gdm-tipsv2.github.io/ • 9 items • Updated Apr 14 • 30
Falcon Perception Collection Falcon-Perception and Falcon-OCR model: early-fusion, natively multimodal, dense Autoregressive Transformer models. • 5 items • Updated Apr 6 • 14
GME Models Collection General Multimodal Embedding Models Released by Tongyi Lab of Alibaba Group • 3 items • Updated Dec 24, 2024 • 10
Searching for Better ViT Baselines Collection Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). • 36 items • Updated Jan 28 • 20
view article Article SmolVLM2: Bringing Video Understanding to Every Device +5 orrzohar, mfarre, andito, merve, pcuenq, cyrilzakka, Xenova • Feb 20, 2025 • 337
view article Article SmolVLM - small yet mighty Vision Language Model +3 andito, merve, mfarre, eliebak, pcuenq • Nov 26, 2024 • 417
view article Article Vision Language Model Alignment in TRL ⚡️ +3 sergiopaniego, merve, qgallouedec, kashif, ariG23498 • Aug 7, 2025 • 111
view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL +4 toslali-ibm, mirinflim, qgallouedec, esnible, rganti, mudhakar • Jun 3, 2025 • 101
VideoLLaMA3 Collection Frontier Multimodal Foundation Models for Video Understanding • 9 items • Updated Mar 2 • 16
Multimodal Latent Language Modeling with Next-Token Diffusion Paper • 2412.08635 • Published Dec 11, 2024 • 49
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6, 2024 • 93
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 159
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 43 items • Updated Mar 2 • 720