Papers
arxiv:2603.19199

FASTER: Rethinking Real-Time Flow VLAs

Published on Mar 19
· Submitted by
Yuxiang Lu
on Mar 20
#3 Paper of the day
Authors:
,
,
,
,
,
,
,

Abstract

Fast Action Sampling for ImmediaTE Reaction (FASTER) reduces real-time reaction latency in Vision-Language-Action models by adapting sampling schedules to prioritize immediate actions while maintaining long-horizon trajectory quality.

AI-generated summary

Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in π_{0.5} and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks unprecedented real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.

Community

Paper submitter

Real-time reaction in VLAs is constrained not only by inference latency, but also by how action chunks are generated and executed. FASTER introduces a new paradigm for fast action sampling under asynchronous execution. By compressing the sampling process for immediate reaction into a single step, FASTER achieves 10x acceleration over π0.5 and X-VLA, enabling real-time responsiveness in highly dynamic tasks such as table tennis.

lowkey the horizon-aware schedule is the clever punchline here, it concentrates denoising on near-term actions so ttfa becomes a one-step quick reaction while the rest of the horizon preserves long-horizon quality.
my main curiosity is how well that uniform reaction-time picture holds when a sudden perturbation demands a mid-chunk adjustment beyond the first fast step.
a small but important sanity check would be an ablation on the streaming latency vs server work, i worry jitter and backpressure could tilt the balance away from the near-term bias in real hardware.
the arxivlens breakdown helped me parse the method details, i found their walkthrough on the horizon-aware schedule and the early-stop pipeline pretty clarifying: https://arxivlens.com/PaperView/Details/faster-rethinking-real-time-flow-vlas-7803-b8cb32e4

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.19199 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.19199 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.19199 in a Space README.md to link it from this page.

Collections including this paper 2