wuxiyang commited on
Commit
40f1a01
·
verified ·
1 Parent(s): c5c3ed4

Add model card with usage instructions and GitHub links

Browse files
Files changed (1) hide show
  1. README.md +121 -0
README.md ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: Qwen/Qwen2.5-3B-Instruct
4
+ tags:
5
+ - saber
6
+ - adversarial-attack
7
+ - vla
8
+ - robotics
9
+ - lora
10
+ - grpo
11
+ - qwen2.5
12
+ - libero
13
+ license: bsd-3-clause
14
+ ---
15
+
16
+ # SABER Attack Agent — Constraint Violation
17
+
18
+ This is a **LoRA adapter** for [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct), trained as part of the **SABER** framework — a stealthy agentic black-box attack system for Vision-Language-Action (VLA) models.
19
+
20
+ **[Paper](https://arxiv.org/abs/2603.24935)** | **[Code](https://github.com/wuxiyang1996/SABER)** | **[Project Page](https://github.com/wuxiyang1996/SABER)**
21
+
22
+ ## Model Description
23
+
24
+ - **Objective:** `constraint_violation` — Trained to increase constraint violations — the victim VLA collides with objects or violates spatial boundaries more frequently.
25
+ - **Base model:** [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
26
+ - **Training pipeline:** Cold-start SFT (GPT-4o distillation) → **GRPO** (Group Relative Policy Optimization) on LIBERO benchmark
27
+ - **GRPO checkpoint step:** 150
28
+ - **LoRA config:** rank=8, alpha=16, all attention + MLP projections (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`)
29
+ - **Tool sets used:** prompt-level tools (clause injection, constraint stacking, etc.)
30
+ - **Victim VLA (training):** Pi0.5 (OpenPI flow-matching, ~2.7B params)
31
+ - **Evaluation benchmark:** LIBERO (4 suites: Spatial, Object, Goal, Long-Horizon)
32
+
33
+ ## Usage
34
+
35
+ ### Quick Start
36
+
37
+ ```python
38
+ from peft import PeftModel
39
+ from transformers import AutoModelForCausalLM, AutoTokenizer
40
+
41
+ base_model = AutoModelForCausalLM.from_pretrained(
42
+ "Qwen/Qwen2.5-3B-Instruct",
43
+ torch_dtype="bfloat16",
44
+ device_map="auto",
45
+ )
46
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
47
+
48
+ model = PeftModel.from_pretrained(
49
+ base_model,
50
+ "IntelligenceLab/saber-attack-agent-constraint-violation",
51
+ )
52
+ ```
53
+
54
+ ### Full SABER Pipeline
55
+
56
+ For the complete attack agent pipeline (ReAct tool-calling, VLA rollouts, reward computation), clone the full repository:
57
+
58
+ ```bash
59
+ git clone https://github.com/wuxiyang1996/SABER
60
+ cd SABER
61
+ bash install.sh
62
+ ```
63
+
64
+ Then run evaluation with this checkpoint:
65
+
66
+ ```bash
67
+ python eval_attack_vla.py \
68
+ --victim openpi_pi05 \
69
+ --attack_base_model Qwen/Qwen2.5-3B-Instruct \
70
+ --attack_model_name saber-attack-agent-constraint-violation \
71
+ --objective constraint_violation \
72
+ --attack_gpus 2,3 \
73
+ --vla_gpu 0
74
+ ```
75
+
76
+ See the [full evaluation guide](https://github.com/wuxiyang1996/SABER#evaluation) and [RUN.md](https://github.com/wuxiyang1996/SABER/blob/main/RUN.md) for detailed instructions.
77
+
78
+ ### Training Your Own
79
+
80
+ ```bash
81
+ python train_vla.py --objective constraint_violation
82
+ ```
83
+
84
+ See [Training the Attack Agent](https://github.com/wuxiyang1996/SABER#training-the-attack-agent) for all configuration options.
85
+
86
+ ## How SABER Works
87
+
88
+ 1. The **attack agent** (this model) receives a task instruction, observation image, and baseline rollout result from the frozen victim VLA.
89
+ 2. It uses a **ReAct-style tool-calling protocol** with character-, token-, and prompt-level perturbation tools to edit the instruction.
90
+ 3. The perturbed instruction is fed to the **frozen victim VLA**, which executes the task in LIBERO simulation.
91
+ 4. A **reward signal** from behavioral differences drives GRPO training — no gradients flow through the victim.
92
+
93
+ ## Key Results
94
+
95
+ On LIBERO across 6 state-of-the-art VLA models, SABER achieves:
96
+
97
+ | Metric | SABER | GPT-4o Baseline |
98
+ |--------|-------|-----------------|
99
+ | Task Success Reduction | **20.6%** | 15.2% |
100
+ | Action Length Increase | **55%** | 38% |
101
+ | Constraint Violation Increase | **33%** | 22% |
102
+ | Avg. Tool Calls | **2.3** | 2.9 |
103
+ | Avg. Char Edits | **18.4** | 40.6 |
104
+
105
+ ## Citation
106
+
107
+ ```bibtex
108
+ @misc{wu2026saber,
109
+ title={SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models},
110
+ author={Xiyang Wu and Guangyao Shi and Qingzi Wang and Zongxia Li and Amrit Singh Bedi and Dinesh Manocha},
111
+ year={2026},
112
+ eprint={2603.24935},
113
+ archivePrefix={arXiv},
114
+ primaryClass={cs.RO},
115
+ url={https://arxiv.org/abs/2603.24935},
116
+ }
117
+ ```
118
+
119
+ ## License
120
+
121
+ BSD 3-Clause License. See [https://github.com/wuxiyang1996/SABER/blob/main/LICENSE](https://github.com/wuxiyang1996/SABER/blob/main/LICENSE).