Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems
Abstract
Physical AI systems face safety challenges where black-box models can execute harmful actions without detection, necessitating comprehensive runtime guardrail mechanisms for safe operation.
Physical AI systems increasingly map multimodal observations, language instructions, and learned world representations into physically consequential actions. Robotics foundation models, vision-language-action models, and world-model-based autonomous systems can condition decisions that move vehicles, robots, drones, and industrial machines. This transition exposes a safety problem that is not fully captured by conventional AI content moderation or by classical robot safety alone: a black-box model may issue a physically consequential action while appearing confident, plausible, and semantically aligned. The resulting failure can be silent, arising from sensor drift, occlusion, state-estimation error, distribution shift, hallucinated affordances, or invalid physical assumptions before downstream hardware controllers detect a violation. Across embodied foundation models, world models, robotics simulation, embodied safety benchmarks, safe control, runtime assurance, uncertainty estimation, verification, and guardrail evaluation, model capability and safety mechanisms have advanced along largely separate technical tracks. A recurring gap synthesized here is that no single stream surveyed in this review supplies a complete runtime authorization boundary between black-box Physical AI models and physical execution. The resulting analysis develops a bounded problem formulation, a definition of silent physical-action failure, a taxonomy of runtime guardrail functions, and evaluation requirements for comparing guardrails as Physical AI assurance mechanisms.
Community
Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems
The synthesized gap — that no surveyed stream supplies a complete runtime authorization boundary — lands, and the word doing the work is "complete." A boundary stays incomplete as long as the authorize/deny decision can be read from something the same system can silently move: the model's own confidence, or a policy/config the pipeline itself can overwrite. Then the silent failure you define isn't removed, it relocates one layer up — the guardrail confidently authorizes because its authority source was quietly wrong.
What makes it complete is resolving authorization to something attributable and external to the black box: a signed grant binding {who may authorize} × {action class} × {physical envelope}, checked against the operation actually being attempted rather than the model's self-report. Absence of a verifiable grant becomes a deny — fail-closed — and a deny is a visible, auditable event instead of a confident wrong action. That turns "silent" into "loud."
One addition to the evaluation requirements: test what the guardrail does when its own authority source is unavailable or stale. A guardrail that fails open when its policy store or verifier is down reintroduces the silent-failure class at the assurance layer — in a physical system, that's the arm still moving while the gatekeeper is offline. Fail-closed-under-degradation belongs alongside the coverage and latency axes.
Get this paper in your agent:
hf papers read 2606.00090 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper