Simplify model card: brief, clear LoRA description
Browse files
README.md
CHANGED
|
@@ -15,92 +15,46 @@ license: bsd-3-clause
|
|
| 15 |
|
| 16 |
# SABER Attack Agent — Task Failure
|
| 17 |
|
| 18 |
-
|
| 19 |
|
| 20 |
-
**[Paper](https://arxiv.org/abs/2603.24935)** | **[
|
| 21 |
|
| 22 |
-
##
|
| 23 |
|
| 24 |
-
|
| 25 |
-
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
|
| 33 |
-
##
|
| 34 |
-
|
| 35 |
-
### Quick Start
|
| 36 |
|
| 37 |
```python
|
| 38 |
from peft import PeftModel
|
| 39 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 40 |
|
| 41 |
-
|
| 42 |
-
"Qwen/Qwen2.5-3B-Instruct",
|
| 43 |
-
torch_dtype="bfloat16",
|
| 44 |
-
device_map="auto",
|
| 45 |
-
)
|
| 46 |
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
|
| 47 |
-
|
| 48 |
-
model = PeftModel.from_pretrained(
|
| 49 |
-
base_model,
|
| 50 |
-
"IntelligenceLab/saber-attack-agent-task-failure",
|
| 51 |
-
)
|
| 52 |
```
|
| 53 |
|
| 54 |
-
##
|
| 55 |
|
| 56 |
-
For the complete attack
|
| 57 |
|
| 58 |
```bash
|
| 59 |
-
git clone https://github.com/wuxiyang1996/SABER
|
| 60 |
-
cd SABER
|
| 61 |
-
bash install.sh
|
| 62 |
-
```
|
| 63 |
-
|
| 64 |
-
Then run evaluation with this checkpoint:
|
| 65 |
|
| 66 |
-
```bash
|
| 67 |
python eval_attack_vla.py \
|
| 68 |
--victim openpi_pi05 \
|
| 69 |
-
--attack_base_model Qwen/Qwen2.5-3B-Instruct \
|
| 70 |
-
--attack_model_name saber-attack-agent-task-failure \
|
| 71 |
--objective task_failure \
|
| 72 |
-
--attack_gpus 2,3
|
| 73 |
-
--vla_gpu 0
|
| 74 |
```
|
| 75 |
|
| 76 |
-
See the [
|
| 77 |
-
|
| 78 |
-
### Training Your Own
|
| 79 |
-
|
| 80 |
-
```bash
|
| 81 |
-
python train_vla.py --objective task_failure
|
| 82 |
-
```
|
| 83 |
-
|
| 84 |
-
See [Training the Attack Agent](https://github.com/wuxiyang1996/SABER#training-the-attack-agent) for all configuration options.
|
| 85 |
-
|
| 86 |
-
## How SABER Works
|
| 87 |
-
|
| 88 |
-
1. The **attack agent** (this model) receives a task instruction, observation image, and baseline rollout result from the frozen victim VLA.
|
| 89 |
-
2. It uses a **ReAct-style tool-calling protocol** with character-, token-, and prompt-level perturbation tools to edit the instruction.
|
| 90 |
-
3. The perturbed instruction is fed to the **frozen victim VLA**, which executes the task in LIBERO simulation.
|
| 91 |
-
4. A **reward signal** from behavioral differences drives GRPO training — no gradients flow through the victim.
|
| 92 |
-
|
| 93 |
-
## Key Results
|
| 94 |
-
|
| 95 |
-
On LIBERO across 6 state-of-the-art VLA models, SABER achieves:
|
| 96 |
-
|
| 97 |
-
| Metric | SABER | GPT-4o Baseline |
|
| 98 |
-
|--------|-------|-----------------|
|
| 99 |
-
| Task Success Reduction | **20.6%** | 15.2% |
|
| 100 |
-
| Action Length Increase | **55%** | 38% |
|
| 101 |
-
| Constraint Violation Increase | **33%** | 22% |
|
| 102 |
-
| Avg. Tool Calls | **2.3** | 2.9 |
|
| 103 |
-
| Avg. Char Edits | **18.4** | 40.6 |
|
| 104 |
|
| 105 |
## Citation
|
| 106 |
|
|
@@ -112,10 +66,9 @@ On LIBERO across 6 state-of-the-art VLA models, SABER achieves:
|
|
| 112 |
eprint={2603.24935},
|
| 113 |
archivePrefix={arXiv},
|
| 114 |
primaryClass={cs.RO},
|
| 115 |
-
url={https://arxiv.org/abs/2603.24935},
|
| 116 |
}
|
| 117 |
```
|
| 118 |
|
| 119 |
## License
|
| 120 |
|
| 121 |
-
BSD 3-Clause
|
|
|
|
| 15 |
|
| 16 |
# SABER Attack Agent — Task Failure
|
| 17 |
|
| 18 |
+
**LoRA adapter** (rank 8) for [`Qwen/Qwen2.5-3B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct), trained with GRPO to generate adversarial instruction perturbations targeting inducing task failure in victim VLA models.
|
| 19 |
|
| 20 |
+
Part of the **SABER** framework: **[Paper](https://arxiv.org/abs/2603.24935)** | **[GitHub](https://github.com/wuxiyang1996/SABER)**
|
| 21 |
|
| 22 |
+
## Details
|
| 23 |
|
| 24 |
+
| | |
|
| 25 |
+
|---|---|
|
| 26 |
+
| **Type** | LoRA adapter (`adapter_model.safetensors`) |
|
| 27 |
+
| **Base model** | [`Qwen/Qwen2.5-3B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) |
|
| 28 |
+
| **Attack objective** | `task_failure` |
|
| 29 |
+
| **Training** | Cold-start SFT → GRPO (step 150) on LIBERO |
|
| 30 |
+
| **LoRA config** | r=8, alpha=16, all attn + MLP projections |
|
| 31 |
+
| **Victim VLA (training)** | Pi0.5 (OpenPI) |
|
| 32 |
|
| 33 |
+
## Quick Start
|
|
|
|
|
|
|
| 34 |
|
| 35 |
```python
|
| 36 |
from peft import PeftModel
|
| 37 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 38 |
|
| 39 |
+
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct", torch_dtype="bfloat16", device_map="auto")
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
|
| 41 |
+
model = PeftModel.from_pretrained(base, "IntelligenceLab/saber-attack-agent-task-failure")
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
```
|
| 43 |
|
| 44 |
+
## Full Pipeline
|
| 45 |
|
| 46 |
+
For the complete attack pipeline (ReAct tool-calling, VLA rollouts, LIBERO evaluation):
|
| 47 |
|
| 48 |
```bash
|
| 49 |
+
git clone https://github.com/wuxiyang1996/SABER && cd SABER && bash install.sh
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
|
|
|
|
| 51 |
python eval_attack_vla.py \
|
| 52 |
--victim openpi_pi05 \
|
|
|
|
|
|
|
| 53 |
--objective task_failure \
|
| 54 |
+
--attack_gpus 2,3 --vla_gpu 0
|
|
|
|
| 55 |
```
|
| 56 |
|
| 57 |
+
See the [GitHub repo](https://github.com/wuxiyang1996/SABER) for training, evaluation, and cross-model transfer instructions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
## Citation
|
| 60 |
|
|
|
|
| 66 |
eprint={2603.24935},
|
| 67 |
archivePrefix={arXiv},
|
| 68 |
primaryClass={cs.RO},
|
|
|
|
| 69 |
}
|
| 70 |
```
|
| 71 |
|
| 72 |
## License
|
| 73 |
|
| 74 |
+
BSD 3-Clause — see [https://github.com/wuxiyang1996/SABER/blob/main/LICENSE](https://github.com/wuxiyang1996/SABER/blob/main/LICENSE).
|