update piguard

Browse files

Files changed (4) hide show

README.md +4 -3
assets/Results.png +2 -2
assets/figure_performance.png +2 -2
assets/visualization_concat.png +2 -2

README.md CHANGED Viewed

@@ -10,9 +10,12 @@ metrics:
 library_name: transformers
 ---
 - Website: https://injecguard.github.io/
-- Paper: https://arxiv.org/pdf/2410.22770
 - Code Repo: https://github.com/leolee99/PIGuard
 ## Abstract
 Prompt injection attacks pose a critical threat to large language models (LLMs), enabling goal hijacking and data leakage. Prompt guard models, though effective in defense, suffer from over-defense—falsely flagging benign inputs as malicious due to trigger word bias. To address this issue, we introduce ***NotInject***, an evaluation dataset that systematically measures over-defense across various prompt guard models. NotInject contains 339 benign samples enriched with trigger words common in prompt injection attacks, enabling fine-grained evaluation. Our results show that state-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60\%). To mitigate this, we propose ***PIGuard***, a novel prompt guard model that incorporates a new training strategy, *Mitigating Over-defense for Free* (MOF), which significantly reduces the bias on trigger words. InjecGuard demonstrates state-of-the-art performance on diverse benchmarks including NotInject, surpassing the existing best model by 30.8\%, offering a robust and open-source solution for detecting prompt injection attacks.
@@ -58,7 +61,6 @@ We have released an online demo, you can access it [here](InjecGuard.github.io).
 If you find this work useful in your research or applications, we appreciate that if you can kindly cite:
-```
 ```
 @articles{PIGuard,
   title={PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free},
@@ -69,5 +71,4 @@ If you find this work useful in your research or applications, we appreciate tha
   journal = {ACL},
   year={2025}
 }
-```
 ```

 library_name: transformers
 ---
 - Website: https://injecguard.github.io/
+- Paper: https://aclanthology.org/2025.acl-long.1468.pdf
 - Code Repo: https://github.com/leolee99/PIGuard
+## News
+Due to some licensing issues, the model name has been changed from **InjecGuard** to **PIGuard**. We apologize for any inconvenience this may have caused.
 ## Abstract
 Prompt injection attacks pose a critical threat to large language models (LLMs), enabling goal hijacking and data leakage. Prompt guard models, though effective in defense, suffer from over-defense—falsely flagging benign inputs as malicious due to trigger word bias. To address this issue, we introduce ***NotInject***, an evaluation dataset that systematically measures over-defense across various prompt guard models. NotInject contains 339 benign samples enriched with trigger words common in prompt injection attacks, enabling fine-grained evaluation. Our results show that state-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60\%). To mitigate this, we propose ***PIGuard***, a novel prompt guard model that incorporates a new training strategy, *Mitigating Over-defense for Free* (MOF), which significantly reduces the bias on trigger words. InjecGuard demonstrates state-of-the-art performance on diverse benchmarks including NotInject, surpassing the existing best model by 30.8\%, offering a robust and open-source solution for detecting prompt injection attacks.
 If you find this work useful in your research or applications, we appreciate that if you can kindly cite:
 ```
 @articles{PIGuard,
   title={PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free},
   journal = {ACL},
   year={2025}
 }
 ```