view article Article SigLIP 2: A better multilingual vision language encoder +1 ariG23498, merve, qubvel-hf • Feb 21, 2025 • 217
Searching for Better ViT Baselines Collection Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). • 36 items • Updated Jan 28 • 20
Fashion-VDM: Video Diffusion Model for Virtual Try-On Paper • 2411.00225 • Published Oct 31, 2024 • 11