Papers
arxiv:2510.00726

CroSTAta: Cross-State Transition Attention Transformer for Robotic Manipulation

Published on Oct 1, 2025
Authors:
,
,
,

Abstract

A novel State Transition Attention mechanism enhances robotic policy learning by modeling temporal structures in demonstration sequences, improving adaptation to execution variations through structured attention and temporal masking.

AI-generated summary

Learning robotic manipulation policies through supervised learning from demonstrations remains challenging when policies encounter execution variations not explicitly covered during training. While incorporating historical context through attention mechanisms can improve robustness, standard approaches process all past states in a sequence without explicitly modeling the temporal structure that demonstrations may include, such as failure and recovery patterns. We propose a Cross-State Transition Attention Transformer that employs a novel State Transition Attention (STA) mechanism to modulate standard attention weights based on learned state evolution patterns, enabling policies to better adapt their behavior based on execution history. Our approach combines this structured attention with temporal masking during training, where visual information is randomly removed from recent timesteps to encourage temporal reasoning from historical context. Evaluation in simulation shows that STA consistently outperforms standard attention approach and temporal modeling methods like TCN and LSTM networks, achieving more than 2x improvement over cross-attention on precision-critical tasks. The source code and data can be accessed at https://github.com/iit-DLSLab/croSTAta

Community

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 4

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.00726 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.