wuxiyang's picture
Add model card with usage instructions and GitHub links
2ba2c12 verified
|
raw
history blame
4.23 kB
metadata
library_name: peft
base_model: Qwen/Qwen2.5-3B-Instruct
tags:
  - saber
  - adversarial-attack
  - vla
  - robotics
  - lora
  - grpo
  - qwen2.5
  - libero
license: bsd-3-clause

SABER Attack Agent β€” Task Failure

This is a LoRA adapter for Qwen/Qwen2.5-3B-Instruct, trained as part of the SABER framework β€” a stealthy agentic black-box attack system for Vision-Language-Action (VLA) models.

Paper | Code | Project Page

Model Description

  • Objective: task_failure β€” Trained to induce task failure β€” the victim VLA fails to complete the instructed manipulation task.
  • Base model: Qwen/Qwen2.5-3B-Instruct
  • Training pipeline: Cold-start SFT (GPT-4o distillation) β†’ GRPO (Group Relative Policy Optimization) on LIBERO benchmark
  • GRPO checkpoint step: 150
  • LoRA config: rank=8, alpha=16, all attention + MLP projections (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
  • Tool sets used: token-level (replace, remove, add, swap)
  • Victim VLA (training): Pi0.5 (OpenPI flow-matching, ~2.7B params)
  • Evaluation benchmark: LIBERO (4 suites: Spatial, Object, Goal, Long-Horizon)

Usage

Quick Start

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")

model = PeftModel.from_pretrained(
    base_model,
    "IntelligenceLab/saber-attack-agent-task-failure",
)

Full SABER Pipeline

For the complete attack agent pipeline (ReAct tool-calling, VLA rollouts, reward computation), clone the full repository:

git clone https://github.com/wuxiyang1996/SABER
cd SABER
bash install.sh

Then run evaluation with this checkpoint:

python eval_attack_vla.py \
    --victim openpi_pi05 \
    --attack_base_model Qwen/Qwen2.5-3B-Instruct \
    --attack_model_name saber-attack-agent-task-failure \
    --objective task_failure \
    --attack_gpus 2,3 \
    --vla_gpu 0

See the full evaluation guide and RUN.md for detailed instructions.

Training Your Own

python train_vla.py --objective task_failure

See Training the Attack Agent for all configuration options.

How SABER Works

  1. The attack agent (this model) receives a task instruction, observation image, and baseline rollout result from the frozen victim VLA.
  2. It uses a ReAct-style tool-calling protocol with character-, token-, and prompt-level perturbation tools to edit the instruction.
  3. The perturbed instruction is fed to the frozen victim VLA, which executes the task in LIBERO simulation.
  4. A reward signal from behavioral differences drives GRPO training β€” no gradients flow through the victim.

Key Results

On LIBERO across 6 state-of-the-art VLA models, SABER achieves:

Metric SABER GPT-4o Baseline
Task Success Reduction 20.6% 15.2%
Action Length Increase 55% 38%
Constraint Violation Increase 33% 22%
Avg. Tool Calls 2.3 2.9
Avg. Char Edits 18.4 40.6

Citation

@misc{wu2026saber,
      title={SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models},
      author={Xiyang Wu and Guangyao Shi and Qingzi Wang and Zongxia Li and Amrit Singh Bedi and Dinesh Manocha},
      year={2026},
      eprint={2603.24935},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2603.24935},
}

License

BSD 3-Clause License. See https://github.com/wuxiyang1996/SABER/blob/main/LICENSE.