Why Far Looks Up: Probing Spatial Representation in Vision-Language Models Paper • 2605.30161 • Published 29 days ago • 60
RoboTwin 2.0 Collection Basic Training Setting on RobotWin 2.0 (RGB-only input, predicting abs joint directly. Trained on the official dataset without pretraining) • 2 items • Updated Mar 19 • 1