| --- |
| tags: |
| - robot manipulation |
| - multi-modal perception |
| - vision-language-action |
| --- |
| |
| # UniLACT |
|
|
| UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models. |
|
|
| ## Abstract |
| Latent action representations learned from unlabeled videos have recently emerged as a promising paradigm for |
| pretraining vision-language-action (VLA) models without explicit robot action supervision. However, latent actions derived |
| solely from RGB observations primarily encode appearancedriven dynamics and lack explicit 3D geometric structure, |
| which is essential for precise and contact-rich manipulation. To address this limitation, we introduce UNILACT, a |
| transformer-based VLA model that incorporates geometric |
| structure through depth-aware latent pretraining, enabling |
| downstream policies to inherit stronger spatial priors. To facilitate this process, we propose UNILARN, a unified latent action |
| learning framework based on inverse and forward dynamics |
| objectives that learns a shared embedding space for RGB and |
| depth while explicitly modeling their cross-modal interactions. |
| This formulation produces modality-specific and unified latent |
| action representations that serve as pseudo-labels for the depthaware pretraining of UNILACT. Extensive experiments in both |
| simulation and real-world settings demonstrate the effectiveness |
| of depth-aware unified latent action representations. UNILACT |
| consistently outperforms RGB-based latent action baselines |
| under in-domain and out-of-domain pretraining regimes, as |
| well as on both seen and unseen manipulation tasks. |
|
|
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{govind2026unilactdepthawarergblatent, |
| title={UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models}, |
| author={Manish Kumar Govind and Dominick Reilly and Pu Wang and Srijan Das}, |
| year={2026}, |
| eprint={2602.20231}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.RO}, |
| url={https://arxiv.org/abs/2602.20231} |
| } |