leolee99 commited on
Commit
ddff6fd
·
1 Parent(s): 5b65d96

update piguard

Browse files
README.md CHANGED
@@ -10,9 +10,12 @@ metrics:
10
  library_name: transformers
11
  ---
12
  - Website: https://injecguard.github.io/
13
- - Paper: https://arxiv.org/pdf/2410.22770
14
  - Code Repo: https://github.com/leolee99/PIGuard
15
 
 
 
 
16
  ## Abstract
17
 
18
  Prompt injection attacks pose a critical threat to large language models (LLMs), enabling goal hijacking and data leakage. Prompt guard models, though effective in defense, suffer from over-defense—falsely flagging benign inputs as malicious due to trigger word bias. To address this issue, we introduce ***NotInject***, an evaluation dataset that systematically measures over-defense across various prompt guard models. NotInject contains 339 benign samples enriched with trigger words common in prompt injection attacks, enabling fine-grained evaluation. Our results show that state-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60\%). To mitigate this, we propose ***PIGuard***, a novel prompt guard model that incorporates a new training strategy, *Mitigating Over-defense for Free* (MOF), which significantly reduces the bias on trigger words. InjecGuard demonstrates state-of-the-art performance on diverse benchmarks including NotInject, surpassing the existing best model by 30.8\%, offering a robust and open-source solution for detecting prompt injection attacks.
@@ -58,7 +61,6 @@ We have released an online demo, you can access it [here](InjecGuard.github.io).
58
 
59
  If you find this work useful in your research or applications, we appreciate that if you can kindly cite:
60
 
61
- ```
62
  ```
63
  @articles{PIGuard,
64
  title={PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free},
@@ -69,5 +71,4 @@ If you find this work useful in your research or applications, we appreciate tha
69
  journal = {ACL},
70
  year={2025}
71
  }
72
- ```
73
  ```
 
10
  library_name: transformers
11
  ---
12
  - Website: https://injecguard.github.io/
13
+ - Paper: https://aclanthology.org/2025.acl-long.1468.pdf
14
  - Code Repo: https://github.com/leolee99/PIGuard
15
 
16
+ ## News
17
+ Due to some licensing issues, the model name has been changed from **InjecGuard** to **PIGuard**. We apologize for any inconvenience this may have caused.
18
+
19
  ## Abstract
20
 
21
  Prompt injection attacks pose a critical threat to large language models (LLMs), enabling goal hijacking and data leakage. Prompt guard models, though effective in defense, suffer from over-defense—falsely flagging benign inputs as malicious due to trigger word bias. To address this issue, we introduce ***NotInject***, an evaluation dataset that systematically measures over-defense across various prompt guard models. NotInject contains 339 benign samples enriched with trigger words common in prompt injection attacks, enabling fine-grained evaluation. Our results show that state-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60\%). To mitigate this, we propose ***PIGuard***, a novel prompt guard model that incorporates a new training strategy, *Mitigating Over-defense for Free* (MOF), which significantly reduces the bias on trigger words. InjecGuard demonstrates state-of-the-art performance on diverse benchmarks including NotInject, surpassing the existing best model by 30.8\%, offering a robust and open-source solution for detecting prompt injection attacks.
 
61
 
62
  If you find this work useful in your research or applications, we appreciate that if you can kindly cite:
63
 
 
64
  ```
65
  @articles{PIGuard,
66
  title={PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free},
 
71
  journal = {ACL},
72
  year={2025}
73
  }
 
74
  ```
assets/Results.png CHANGED

Git LFS Details

  • SHA256: 6651e95b1f4c10db0d70cda1de224e7f9826f7cc125a8ce2cb7179a9b0e53d43
  • Pointer size: 131 Bytes
  • Size of remote file: 333 kB

Git LFS Details

  • SHA256: 439a07a1d9a540947bce17885f87dd63f21185df39425298bfe2ccbf323afa10
  • Pointer size: 131 Bytes
  • Size of remote file: 341 kB
assets/figure_performance.png CHANGED

Git LFS Details

  • SHA256: 33ac213db1bfdb433340dcda2de860f2e447dda2485119e71b5d87571971ce82
  • Pointer size: 132 Bytes
  • Size of remote file: 8.47 MB

Git LFS Details

  • SHA256: be6d1d1ea76281d85bb04876137aeb4ad79a2f2197fc896af99b6443cdccf4d3
  • Pointer size: 131 Bytes
  • Size of remote file: 486 kB
assets/visualization_concat.png CHANGED

Git LFS Details

  • SHA256: f56a1de00be3c6e84db358c90aa916df4582c09c8a991ef2676f33297cc31b0c
  • Pointer size: 132 Bytes
  • Size of remote file: 4.06 MB

Git LFS Details

  • SHA256: 69004a7eb42e4be2c4760b55bbbd4ae737a6b10c577c157871774593b9b84a02
  • Pointer size: 131 Bytes
  • Size of remote file: 398 kB