update piguard
Browse files- README.md +4 -3
- assets/Results.png +2 -2
- assets/figure_performance.png +2 -2
- assets/visualization_concat.png +2 -2
README.md
CHANGED
|
@@ -10,9 +10,12 @@ metrics:
|
|
| 10 |
library_name: transformers
|
| 11 |
---
|
| 12 |
- Website: https://injecguard.github.io/
|
| 13 |
-
- Paper: https://
|
| 14 |
- Code Repo: https://github.com/leolee99/PIGuard
|
| 15 |
|
|
|
|
|
|
|
|
|
|
| 16 |
## Abstract
|
| 17 |
|
| 18 |
Prompt injection attacks pose a critical threat to large language models (LLMs), enabling goal hijacking and data leakage. Prompt guard models, though effective in defense, suffer from over-defense—falsely flagging benign inputs as malicious due to trigger word bias. To address this issue, we introduce ***NotInject***, an evaluation dataset that systematically measures over-defense across various prompt guard models. NotInject contains 339 benign samples enriched with trigger words common in prompt injection attacks, enabling fine-grained evaluation. Our results show that state-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60\%). To mitigate this, we propose ***PIGuard***, a novel prompt guard model that incorporates a new training strategy, *Mitigating Over-defense for Free* (MOF), which significantly reduces the bias on trigger words. InjecGuard demonstrates state-of-the-art performance on diverse benchmarks including NotInject, surpassing the existing best model by 30.8\%, offering a robust and open-source solution for detecting prompt injection attacks.
|
|
@@ -58,7 +61,6 @@ We have released an online demo, you can access it [here](InjecGuard.github.io).
|
|
| 58 |
|
| 59 |
If you find this work useful in your research or applications, we appreciate that if you can kindly cite:
|
| 60 |
|
| 61 |
-
```
|
| 62 |
```
|
| 63 |
@articles{PIGuard,
|
| 64 |
title={PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free},
|
|
@@ -69,5 +71,4 @@ If you find this work useful in your research or applications, we appreciate tha
|
|
| 69 |
journal = {ACL},
|
| 70 |
year={2025}
|
| 71 |
}
|
| 72 |
-
```
|
| 73 |
```
|
|
|
|
| 10 |
library_name: transformers
|
| 11 |
---
|
| 12 |
- Website: https://injecguard.github.io/
|
| 13 |
+
- Paper: https://aclanthology.org/2025.acl-long.1468.pdf
|
| 14 |
- Code Repo: https://github.com/leolee99/PIGuard
|
| 15 |
|
| 16 |
+
## News
|
| 17 |
+
Due to some licensing issues, the model name has been changed from **InjecGuard** to **PIGuard**. We apologize for any inconvenience this may have caused.
|
| 18 |
+
|
| 19 |
## Abstract
|
| 20 |
|
| 21 |
Prompt injection attacks pose a critical threat to large language models (LLMs), enabling goal hijacking and data leakage. Prompt guard models, though effective in defense, suffer from over-defense—falsely flagging benign inputs as malicious due to trigger word bias. To address this issue, we introduce ***NotInject***, an evaluation dataset that systematically measures over-defense across various prompt guard models. NotInject contains 339 benign samples enriched with trigger words common in prompt injection attacks, enabling fine-grained evaluation. Our results show that state-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60\%). To mitigate this, we propose ***PIGuard***, a novel prompt guard model that incorporates a new training strategy, *Mitigating Over-defense for Free* (MOF), which significantly reduces the bias on trigger words. InjecGuard demonstrates state-of-the-art performance on diverse benchmarks including NotInject, surpassing the existing best model by 30.8\%, offering a robust and open-source solution for detecting prompt injection attacks.
|
|
|
|
| 61 |
|
| 62 |
If you find this work useful in your research or applications, we appreciate that if you can kindly cite:
|
| 63 |
|
|
|
|
| 64 |
```
|
| 65 |
@articles{PIGuard,
|
| 66 |
title={PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free},
|
|
|
|
| 71 |
journal = {ACL},
|
| 72 |
year={2025}
|
| 73 |
}
|
|
|
|
| 74 |
```
|
assets/Results.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
assets/figure_performance.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
assets/visualization_concat.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|