IntelligenceLab
/

saber-attack-agent-constraint-violation

+---
+library_name: peft
+base_model: Qwen/Qwen2.5-3B-Instruct
+tags:
+  - saber
+  - adversarial-attack
+  - vla
+  - robotics
+  - lora
+  - grpo
+  - qwen2.5
+  - libero
+license: bsd-3-clause
+---
+# SABER Attack Agent — Constraint Violation
+This is a **LoRA adapter** for [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct), trained as part of the **SABER** framework — a stealthy agentic black-box attack system for Vision-Language-Action (VLA) models.
+**[Paper](https://arxiv.org/abs/2603.24935)** | **[Code](https://github.com/wuxiyang1996/SABER)** | **[Project Page](https://github.com/wuxiyang1996/SABER)**
+## Model Description
+- **Objective:** `constraint_violation` — Trained to increase constraint violations — the victim VLA collides with objects or violates spatial boundaries more frequently.
+- **Base model:** [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
+- **Training pipeline:** Cold-start SFT (GPT-4o distillation) → **GRPO** (Group Relative Policy Optimization) on LIBERO benchmark
+- **GRPO checkpoint step:** 150
+- **LoRA config:** rank=8, alpha=16, all attention + MLP projections (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`)
+- **Tool sets used:** prompt-level tools (clause injection, constraint stacking, etc.)
+- **Victim VLA (training):** Pi0.5 (OpenPI flow-matching, ~2.7B params)
+- **Evaluation benchmark:** LIBERO (4 suites: Spatial, Object, Goal, Long-Horizon)
+## Usage
+### Quick Start
+```python
+from peft import PeftModel
+from transformers import AutoModelForCausalLM, AutoTokenizer
+base_model = AutoModelForCausalLM.from_pretrained(
+    "Qwen/Qwen2.5-3B-Instruct",
+    torch_dtype="bfloat16",
+    device_map="auto",
+)
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
+model = PeftModel.from_pretrained(
+    base_model,
+    "IntelligenceLab/saber-attack-agent-constraint-violation",
+)
+```
+### Full SABER Pipeline
+For the complete attack agent pipeline (ReAct tool-calling, VLA rollouts, reward computation), clone the full repository:
+```bash
+git clone https://github.com/wuxiyang1996/SABER
+cd SABER
+bash install.sh
+```
+Then run evaluation with this checkpoint:
+```bash
+python eval_attack_vla.py \
+    --victim openpi_pi05 \
+    --attack_base_model Qwen/Qwen2.5-3B-Instruct \
+    --attack_model_name saber-attack-agent-constraint-violation \
+    --objective constraint_violation \
+    --attack_gpus 2,3 \
+    --vla_gpu 0
+```
+See the [full evaluation guide](https://github.com/wuxiyang1996/SABER#evaluation) and [RUN.md](https://github.com/wuxiyang1996/SABER/blob/main/RUN.md) for detailed instructions.
+### Training Your Own
+```bash
+python train_vla.py --objective constraint_violation
+```
+See [Training the Attack Agent](https://github.com/wuxiyang1996/SABER#training-the-attack-agent) for all configuration options.
+## How SABER Works
+1. The **attack agent** (this model) receives a task instruction, observation image, and baseline rollout result from the frozen victim VLA.
+2. It uses a **ReAct-style tool-calling protocol** with character-, token-, and prompt-level perturbation tools to edit the instruction.
+3. The perturbed instruction is fed to the **frozen victim VLA**, which executes the task in LIBERO simulation.
+4. A **reward signal** from behavioral differences drives GRPO training — no gradients flow through the victim.
+## Key Results
+On LIBERO across 6 state-of-the-art VLA models, SABER achieves:
+| Metric | SABER | GPT-4o Baseline |
+|--------|-------|-----------------|
+| Task Success Reduction | **20.6%** | 15.2% |
+| Action Length Increase | **55%** | 38% |
+| Constraint Violation Increase | **33%** | 22% |
+| Avg. Tool Calls | **2.3** | 2.9 |
+| Avg. Char Edits | **18.4** | 40.6 |
+## Citation
+```bibtex
+@misc{wu2026saber,
+      title={SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models},
+      author={Xiyang Wu and Guangyao Shi and Qingzi Wang and Zongxia Li and Amrit Singh Bedi and Dinesh Manocha},
+      year={2026},
+      eprint={2603.24935},
+      archivePrefix={arXiv},
+      primaryClass={cs.RO},
+      url={https://arxiv.org/abs/2603.24935},
+}
+```
+## License
+BSD 3-Clause License. See [https://github.com/wuxiyang1996/SABER/blob/main/LICENSE](https://github.com/wuxiyang1996/SABER/blob/main/LICENSE).