DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 15 items • Updated Mar 10 • 633
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20, 2025 • 164
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 96
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models +1 andito, merve, SkalskiP • Jun 24, 2024 • 207
MVDream: Multi-view Diffusion for 3D Generation Paper • 2308.16512 • Published Aug 31, 2023 • 106
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 107