IntelligenceLab
/

saber-attack-agent-task-failure

@@ -15,92 +15,46 @@ license: bsd-3-clause
 # SABER Attack Agent — Task Failure
-This is a **LoRA adapter** for [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct), trained as part of the **SABER** framework — a stealthy agentic black-box attack system for Vision-Language-Action (VLA) models.
-**[Paper](https://arxiv.org/abs/2603.24935)** | **[Code](https://github.com/wuxiyang1996/SABER)** | **[Project Page](https://github.com/wuxiyang1996/SABER)**
-## Model Description
-- **Objective:** `task_failure` — Trained to induce task failure — the victim VLA fails to complete the instructed manipulation task.
-- **Base model:** [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
-- **Training pipeline:** Cold-start SFT (GPT-4o distillation) → **GRPO** (Group Relative Policy Optimization) on LIBERO benchmark
-- **GRPO checkpoint step:** 150
-- **LoRA config:** rank=8, alpha=16, all attention + MLP projections (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`)
-- **Tool sets used:** token-level (replace, remove, add, swap)
-- **Victim VLA (training):** Pi0.5 (OpenPI flow-matching, ~2.7B params)
-- **Evaluation benchmark:** LIBERO (4 suites: Spatial, Object, Goal, Long-Horizon)
-## Usage
-### Quick Start
 ```python
 from peft import PeftModel
 from transformers import AutoModelForCausalLM, AutoTokenizer
-base_model = AutoModelForCausalLM.from_pretrained(
-    "Qwen/Qwen2.5-3B-Instruct",
-    torch_dtype="bfloat16",
-    device_map="auto",
-)
 tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
-model = PeftModel.from_pretrained(
-    base_model,
-    "IntelligenceLab/saber-attack-agent-task-failure",
-)
 ```
-### Full SABER Pipeline
-For the complete attack agent pipeline (ReAct tool-calling, VLA rollouts, reward computation), clone the full repository:
 ```bash
-git clone https://github.com/wuxiyang1996/SABER
-cd SABER
-bash install.sh
-```
-Then run evaluation with this checkpoint:
-```bash
 python eval_attack_vla.py \
     --victim openpi_pi05 \
-    --attack_base_model Qwen/Qwen2.5-3B-Instruct \
-    --attack_model_name saber-attack-agent-task-failure \
     --objective task_failure \
-    --attack_gpus 2,3 \
-    --vla_gpu 0
 ```
-See the [full evaluation guide](https://github.com/wuxiyang1996/SABER#evaluation) and [RUN.md](https://github.com/wuxiyang1996/SABER/blob/main/RUN.md) for detailed instructions.
-### Training Your Own
-```bash
-python train_vla.py --objective task_failure
-```
-See [Training the Attack Agent](https://github.com/wuxiyang1996/SABER#training-the-attack-agent) for all configuration options.
-## How SABER Works
-1. The **attack agent** (this model) receives a task instruction, observation image, and baseline rollout result from the frozen victim VLA.
-2. It uses a **ReAct-style tool-calling protocol** with character-, token-, and prompt-level perturbation tools to edit the instruction.
-3. The perturbed instruction is fed to the **frozen victim VLA**, which executes the task in LIBERO simulation.
-4. A **reward signal** from behavioral differences drives GRPO training — no gradients flow through the victim.
-## Key Results
-On LIBERO across 6 state-of-the-art VLA models, SABER achieves:
-| Metric | SABER | GPT-4o Baseline |
-|--------|-------|-----------------|
-| Task Success Reduction | **20.6%** | 15.2% |
-| Action Length Increase | **55%** | 38% |
-| Constraint Violation Increase | **33%** | 22% |
-| Avg. Tool Calls | **2.3** | 2.9 |
-| Avg. Char Edits | **18.4** | 40.6 |
 ## Citation
@@ -112,10 +66,9 @@ On LIBERO across 6 state-of-the-art VLA models, SABER achieves:
       eprint={2603.24935},
       archivePrefix={arXiv},
       primaryClass={cs.RO},
-      url={https://arxiv.org/abs/2603.24935},
 }
 ```
 ## License
-BSD 3-Clause License. See [https://github.com/wuxiyang1996/SABER/blob/main/LICENSE](https://github.com/wuxiyang1996/SABER/blob/main/LICENSE).

 # SABER Attack Agent — Task Failure
+**LoRA adapter** (rank 8) for [`Qwen/Qwen2.5-3B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct), trained with GRPO to generate adversarial instruction perturbations targeting inducing task failure in victim VLA models.
+Part of the **SABER** framework: **[Paper](https://arxiv.org/abs/2603.24935)** | **[GitHub](https://github.com/wuxiyang1996/SABER)**
+## Details
+| | |
+|---|---|
+| **Type** | LoRA adapter (`adapter_model.safetensors`) |
+| **Base model** | [`Qwen/Qwen2.5-3B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) |
+| **Attack objective** | `task_failure` |
+| **Training** | Cold-start SFT → GRPO (step 150) on LIBERO |
+| **LoRA config** | r=8, alpha=16, all attn + MLP projections |
+| **Victim VLA (training)** | Pi0.5 (OpenPI) |
+## Quick Start
 ```python
 from peft import PeftModel
 from transformers import AutoModelForCausalLM, AutoTokenizer
+base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct", torch_dtype="bfloat16", device_map="auto")
 tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
+model = PeftModel.from_pretrained(base, "IntelligenceLab/saber-attack-agent-task-failure")
 ```
+## Full Pipeline
+For the complete attack pipeline (ReAct tool-calling, VLA rollouts, LIBERO evaluation):
 ```bash
+git clone https://github.com/wuxiyang1996/SABER && cd SABER && bash install.sh
 python eval_attack_vla.py \
     --victim openpi_pi05 \
     --objective task_failure \
+    --attack_gpus 2,3 --vla_gpu 0
 ```
+See the [GitHub repo](https://github.com/wuxiyang1996/SABER) for training, evaluation, and cross-model transfer instructions.
 ## Citation
       eprint={2603.24935},
       archivePrefix={arXiv},
       primaryClass={cs.RO},
 }
 ```
 ## License
+BSD 3-Clause — see [https://github.com/wuxiyang1996/SABER/blob/main/LICENSE](https://github.com/wuxiyang1996/SABER/blob/main/LICENSE).