metadata
library_name: peft
base_model: Qwen/Qwen2.5-3B-Instruct
tags:
- saber
- adversarial-attack
- vla
- robotics
- lora
- grpo
- qwen2.5
- libero
license: bsd-3-clause
SABER Attack Agent β Task Failure
This is a LoRA adapter for Qwen/Qwen2.5-3B-Instruct, trained as part of the SABER framework β a stealthy agentic black-box attack system for Vision-Language-Action (VLA) models.
Paper | Code | Project Page
Model Description
- Objective:
task_failureβ Trained to induce task failure β the victim VLA fails to complete the instructed manipulation task. - Base model: Qwen/Qwen2.5-3B-Instruct
- Training pipeline: Cold-start SFT (GPT-4o distillation) β GRPO (Group Relative Policy Optimization) on LIBERO benchmark
- GRPO checkpoint step: 150
- LoRA config: rank=8, alpha=16, all attention + MLP projections (
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj) - Tool sets used: token-level (replace, remove, add, swap)
- Victim VLA (training): Pi0.5 (OpenPI flow-matching, ~2.7B params)
- Evaluation benchmark: LIBERO (4 suites: Spatial, Object, Goal, Long-Horizon)
Usage
Quick Start
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-3B-Instruct",
torch_dtype="bfloat16",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
model = PeftModel.from_pretrained(
base_model,
"IntelligenceLab/saber-attack-agent-task-failure",
)
Full SABER Pipeline
For the complete attack agent pipeline (ReAct tool-calling, VLA rollouts, reward computation), clone the full repository:
git clone https://github.com/wuxiyang1996/SABER
cd SABER
bash install.sh
Then run evaluation with this checkpoint:
python eval_attack_vla.py \
--victim openpi_pi05 \
--attack_base_model Qwen/Qwen2.5-3B-Instruct \
--attack_model_name saber-attack-agent-task-failure \
--objective task_failure \
--attack_gpus 2,3 \
--vla_gpu 0
See the full evaluation guide and RUN.md for detailed instructions.
Training Your Own
python train_vla.py --objective task_failure
See Training the Attack Agent for all configuration options.
How SABER Works
- The attack agent (this model) receives a task instruction, observation image, and baseline rollout result from the frozen victim VLA.
- It uses a ReAct-style tool-calling protocol with character-, token-, and prompt-level perturbation tools to edit the instruction.
- The perturbed instruction is fed to the frozen victim VLA, which executes the task in LIBERO simulation.
- A reward signal from behavioral differences drives GRPO training β no gradients flow through the victim.
Key Results
On LIBERO across 6 state-of-the-art VLA models, SABER achieves:
| Metric | SABER | GPT-4o Baseline |
|---|---|---|
| Task Success Reduction | 20.6% | 15.2% |
| Action Length Increase | 55% | 38% |
| Constraint Violation Increase | 33% | 22% |
| Avg. Tool Calls | 2.3 | 2.9 |
| Avg. Char Edits | 18.4 | 40.6 |
Citation
@misc{wu2026saber,
title={SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models},
author={Xiyang Wu and Guangyao Shi and Qingzi Wang and Zongxia Li and Amrit Singh Bedi and Dinesh Manocha},
year={2026},
eprint={2603.24935},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2603.24935},
}
License
BSD 3-Clause License. See https://github.com/wuxiyang1996/SABER/blob/main/LICENSE.