Key models for robotic computer vision, object grounding, 3D reconstruction, and sim-to-real transfer
-
nvidia/LocateAnything-3B
Image-Text-to-Text • 4B • Updated • 728k • 2.48k -
facebook/dinov2-large
Image Feature Extraction • 0.3B • Updated • 974k • 113 -
microsoft/Florence-2-large
Image-Text-to-Text • 0.8B • Updated • 660k • 1.83k -
facebook/sam2-hiera-large
Mask Generation • 0.2B • Updated • 12.4k • 140