Add model card with usage instructions and GitHub links

2ba2c12 verified 13 days ago

4.23 kB

library_name: peft
base_model: Qwen/Qwen2.5-3B-Instruct
tags:
  - saber
  - adversarial-attack
  - vla
  - robotics
  - lora
  - grpo
  - qwen2.5
  - libero
license: bsd-3-clause

SABER Attack Agent — Task Failure

This is a LoRA adapter for Qwen/Qwen2.5-3B-Instruct, trained as part of the SABER framework — a stealthy agentic black-box attack system for Vision-Language-Action (VLA) models.

Paper | Code | Project Page

Model Description

Objective: task_failure — Trained to induce task failure — the victim VLA fails to complete the instructed manipulation task.
Base model: Qwen/Qwen2.5-3B-Instruct
Training pipeline: Cold-start SFT (GPT-4o distillation) → GRPO (Group Relative Policy Optimization) on LIBERO benchmark
GRPO checkpoint step: 150
LoRA config: rank=8, alpha=16, all attention + MLP projections (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
Tool sets used: token-level (replace, remove, add, swap)
Victim VLA (training): Pi0.5 (OpenPI flow-matching, ~2.7B params)
Evaluation benchmark: LIBERO (4 suites: Spatial, Object, Goal, Long-Horizon)

Usage

Quick Start

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")

model = PeftModel.from_pretrained(
    base_model,
    "IntelligenceLab/saber-attack-agent-task-failure",
)

Full SABER Pipeline

For the complete attack agent pipeline (ReAct tool-calling, VLA rollouts, reward computation), clone the full repository:

git clone https://github.com/wuxiyang1996/SABER
cd SABER
bash install.sh

Then run evaluation with this checkpoint:

python eval_attack_vla.py \
    --victim openpi_pi05 \
    --attack_base_model Qwen/Qwen2.5-3B-Instruct \
    --attack_model_name saber-attack-agent-task-failure \
    --objective task_failure \
    --attack_gpus 2,3 \
    --vla_gpu 0

See the full evaluation guide and RUN.md for detailed instructions.

Training Your Own

python train_vla.py --objective task_failure

See Training the Attack Agent for all configuration options.

How SABER Works

The attack agent (this model) receives a task instruction, observation image, and baseline rollout result from the frozen victim VLA.
It uses a ReAct-style tool-calling protocol with character-, token-, and prompt-level perturbation tools to edit the instruction.
The perturbed instruction is fed to the frozen victim VLA, which executes the task in LIBERO simulation.
A reward signal from behavioral differences drives GRPO training — no gradients flow through the victim.

Key Results

On LIBERO across 6 state-of-the-art VLA models, SABER achieves:

Metric	SABER	GPT-4o Baseline
Task Success Reduction	20.6%	15.2%
Action Length Increase	55%	38%
Constraint Violation Increase	33%	22%
Avg. Tool Calls	2.3	2.9
Avg. Char Edits	18.4	40.6

Citation

@misc{wu2026saber,
      title={SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models},
      author={Xiyang Wu and Guangyao Shi and Qingzi Wang and Zongxia Li and Amrit Singh Bedi and Dinesh Manocha},
      year={2026},
      eprint={2603.24935},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2603.24935},
}

License

BSD 3-Clause License. See https://github.com/wuxiyang1996/SABER/blob/main/LICENSE.