DEF-attackvla — ANIMA Defense Guard for VLA Models

CUDA-accelerated adversarial defense model that detects and blocks attacks on Vision-Language-Action (VLA) robot models.

Model

DefenseNet (0.63M parameters) — lightweight defense guard that sits in front of any VLA model:

PatchDetectorHead: Uses CUDA local_tv_map kernel to extract per-pixel total variation features, detecting adversarial patches
ImageAnomalyClassifier: Uses CUDA fused_dual_normalize kernel for VLA-compatible DINOv2+SigLIP dual normalization, then classifies clean vs adversarial
RandomizedSmoothing: Uses CUDA fused_smooth_clamp kernel for inference-time noise+clamp in a single pass

Training

Data: 250,619 real LIBERO robot manipulation frames (273K) + COCO natural images (5K)
Attacks: 5 types — UPA (universal patch), BackdoorVLA trigger (blue cube), Gaussian noise, checkerboard, colored square
Epochs: 41 (early stopped), val_acc=99.95%
Hardware: NVIDIA L4 (23GB), CUDA 12.8, torch cu128
CUDA kernels: 3 custom ops compiled for sm_89 (L4), active in every forward pass

Evaluation (on real LIBERO task suites)

Task Suite	Accuracy	TPR	FPR
libero_long (tasks 0-9)	97.4%	95.4%	0.6%
libero_object (tasks 10-19)	97.6%	95.8%	0.6%
libero_spatial (tasks 20-29)	97.8%	96.2%	0.6%
libero_goal (tasks 30-39)	97.6%	95.9%	0.6%
Overall	97.6%	95.8%	0.6%

Protected VLA Models

Model	Type	Params
OpenVLA-7B	openvla	7B
Pi0-Fast	pi0_fast	3B
Pi0.5	pi0	3B
SmolVLA	smolvla	400M
BitVLA	bitvla_llava	3B
X-VLA-Pt	xvla	1.5B

Paper

AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models

ArXiv: 2511.12149
Reference: github.com/lijayuTnT/AttackVLA

Files

File	Format	Size
defense_net.safetensors	SafeTensors	2.5MB
defense_net.pth	PyTorch	2.5MB
defense_net.onnx	ONNX	2.5MB
libero_eval_report.json	Eval metrics	9KB
train_real.toml	Training config	<1KB
anima_module.yaml	Module manifest	<1KB

Usage

import torch
from anima_def_attackvla.models.defense_net import DefenseNet

model = DefenseNet()
ckpt = torch.load("defense_net.pth", weights_only=True)
model.load_state_dict(ckpt["model"])
model.eval().cuda()

image = torch.rand(1, 3, 224, 224, device="cuda")
blocked, score, sanitized = model.detect_and_sanitize(image, threshold=0.5)

License

MIT

ilessio-aiflowlab
/

DEF-attackvla