Key models for robotic computer vision, object grounding, 3D reconstruction, and sim-to-real transfer