davidberenstein1957 commited on
Commit
586ff79
·
verified ·
1 Parent(s): a2d153d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -13
README.md CHANGED
@@ -18,19 +18,21 @@ One guardrail for all, all guardrails for one!
18
 
19
  ### Which guards are available?
20
 
21
- Below is an overview of all guardrails, showing the best results for the smallest (-2m), best-performing, and multi-lingual models across all configurations.
22
 
23
  | Dataset | Classifies | Collection | Smallest (2m) | Best Performing | Multi-lingual (128m) |
24
  | --- | --- | --- | --- | --- | --- |
25
- | [harmfulness-mix](https://huggingface.co/datasets/nicholasKluge/harmful-text) | prompt-harmfulness | [Collection](https://huggingface.co/collections/enguard/prompt-harmfulness-harmfulness-mix) | [0.9192](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-harmfulness-harmfulness-mix) | [0.9350](https://huggingface.co/enguard/small-guard-32m-en-prompt-harmfulness-harmfulness-mix) | - |
26
- | [intel](https://huggingface.co/datasets/Intel/polite-guard) | response-politeness | [Collection](https://huggingface.co/collections/enguard/response-politeness-intel) | [0.8795](https://huggingface.co/enguard/tiny-guard-2m-en-response-politeness-intel) | [0.8908](https://huggingface.co/enguard/medium-guard-128m-xx-response-politeness-intel) | [0.8908](https://huggingface.co/enguard/medium-guard-128m-xx-response-politeness-intel) |
27
- | [jailbreak-in-the-wild](https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts) | prompt-jailbreak | [Collection](https://huggingface.co/collections/enguard/prompt-jailbreak-jailbreak-in-the-wild) | [0.8515](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-jailbreak-jailbreak-in-the-wild) | [0.8905](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-jailbreak-jailbreak-in-the-wild) | [0.8905](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-jailbreak-jailbreak-in-the-wild) |
28
- | [jailbreak-sok](https://huggingface.co/datasets/youbin2014/JailbreakDB) | prompt-jailbreak | [Collection](https://huggingface.co/collections/enguard/prompt-jailbreak-jailbreak-sok) | [0.9762](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-jailbreak-jailbreak-sok) | [0.9810](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-jailbreak-jailbreak-sok) | [0.9810](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-jailbreak-jailbreak-sok) |
29
- | [jigsaw](https://huggingface.co/datasets/google/jigsaw_toxicity_pred) | prompt-toxicity | [Collection](https://huggingface.co/collections/enguard/prompt-toxicity-jigsaw) | [0.8967](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-toxicity-jigsaw) | [0.9067](https://huggingface.co/enguard/small-guard-32m-en-prompt-toxicity-jigsaw) | [0.8986](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-toxicity-jigsaw) |
30
- | [nvidia-aegis](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) | response-safety | [Collection](https://huggingface.co/collections/enguard/response-safety-nvidia-aegis) | [0.7612](https://huggingface.co/enguard/tiny-guard-2m-en-response-safety-nvidia-aegis) | [0.7760](https://huggingface.co/enguard/tiny-guard-4m-en-response-safety-nvidia-aegis) | [0.7530](https://huggingface.co/enguard/medium-guard-128m-xx-response-safety-nvidia-aegis) |
31
- | [nvidia-aegis](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) | prompt-safety | [Collection](https://huggingface.co/collections/enguard/prompt-safety-nvidia-aegis) | [0.7957](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-safety-nvidia-aegis) | [0.8131](https://huggingface.co/enguard/tiny-guard-8m-en-prompt-safety-nvidia-aegis) | [0.7929](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-safety-nvidia-aegis) |
32
- | [polyguard](https://huggingface.co/datasets/ToxicityPrompts/PolyGuardMix) | response-safety | [Collection](https://huggingface.co/collections/enguard/response-safety-polyguard) | [0.8635](https://huggingface.co/enguard/tiny-guard-2m-en-response-safety-polyguard) | [0.8808](https://huggingface.co/enguard/small-guard-32m-en-response-safety-polyguard) | [0.8753](https://huggingface.co/enguard/medium-guard-128m-xx-response-safety-polyguard) |
33
- | [polyguard](https://huggingface.co/datasets/ToxicityPrompts/PolyGuardMix) | response-refusal | [Collection](https://huggingface.co/collections/enguard/response-refusal-polyguard) | [0.8952](https://huggingface.co/enguard/tiny-guard-2m-en-response-refusal-polyguard) | [0.9039](https://huggingface.co/enguard/tiny-guard-8m-en-response-refusal-polyguard) | [0.9015](https://huggingface.co/enguard/medium-guard-128m-xx-response-refusal-polyguard) |
34
- | [polyguard](https://huggingface.co/datasets/ToxicityPrompts/PolyGuardMix) | prompt-safety | [Collection](https://huggingface.co/collections/enguard/prompt-safety-polyguard) | - | [0.9331](https://huggingface.co/enguard/small-guard-32m-en-prompt-safety-polyguard) | [0.9255](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-safety-polyguard) |
35
- | [toxic-chat](https://huggingface.co/datasets/lmsys/toxic-chat) | response-jailbreak | [Collection](https://huggingface.co/collections/enguard/response-jailbreak-toxic-chat) | [0.9872](https://huggingface.co/enguard/tiny-guard-2m-en-response-jailbreak-toxic-chat) | [0.9914](https://huggingface.co/enguard/small-guard-32m-en-response-jailbreak-toxic-chat) | - |
36
- | [toxic-chat](https://huggingface.co/datasets/lmsys/toxic-chat) | prompt-toxicity | [Collection](https://huggingface.co/collections/enguard/prompt-toxicity-toxic-chat) | [0.9515](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-toxicity-toxic-chat) | [0.9555](https://huggingface.co/enguard/tiny-guard-8m-en-prompt-toxicity-toxic-chat) | - |
 
 
 
18
 
19
  ### Which guards are available?
20
 
21
+ Below is an overview of all guardrails showing best results for smallest (-2m), best performing, and multi-lingual models across all configs.
22
 
23
  | Dataset | Classifies | Collection | Smallest (2m) | Best Performing | Multi-lingual (128m) |
24
  | --- | --- | --- | --- | --- | --- |
25
+ | [harmfulness-mix](https://huggingface.co/datasets/nicholasKluge/harmful-text) | prompt-harmfulness-binary | [Collection](https://huggingface.co/collections/enguard/prompt-harmfulness-binary-harmfulness-mix) | [0.9411](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-harmfulness-binary-harmfulness-mix) | [0.9413](https://huggingface.co/enguard/tiny-guard-8m-en-prompt-harmfulness-binary-harmfulness-mix) | - |
26
+ | [in-the-wild](https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts) | prompt-jailbreak-binary | [Collection](https://huggingface.co/collections/enguard/prompt-jailbreak-binary-in-the-wild) | [0.9267](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-jailbreak-binary-in-the-wild) | [0.9370](https://huggingface.co/enguard/tiny-guard-4m-en-prompt-jailbreak-binary-in-the-wild) | [0.9064](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-jailbreak-binary-in-the-wild) |
27
+ | [intel](https://huggingface.co/datasets/Intel/polite-guard) | general-politeness-multilabel | [Collection](https://huggingface.co/collections/enguard/general-politeness-multilabel-intel) | [0.9903](https://huggingface.co/enguard/tiny-guard-2m-en-general-politeness-multilabel-intel) | [0.9916](https://huggingface.co/enguard/small-guard-32m-en-general-politeness-multilabel-intel) | [0.9912](https://huggingface.co/enguard/medium-guard-128m-xx-general-politeness-multilabel-intel) |
28
+ | [intel](https://huggingface.co/datasets/Intel/polite-guard) | general-politeness-binary | [Collection](https://huggingface.co/collections/enguard/general-politeness-binary-intel) | [0.9839](https://huggingface.co/enguard/tiny-guard-2m-en-general-politeness-binary-intel) | [0.9866](https://huggingface.co/enguard/small-guard-32m-en-general-politeness-binary-intel) | [0.9816](https://huggingface.co/enguard/medium-guard-128m-xx-general-politeness-binary-intel) |
29
+ | [jigsaw](https://huggingface.co/datasets/google/jigsaw_toxicity_pred) | prompt-toxicity-binary | [Collection](https://huggingface.co/collections/enguard/prompt-toxicity-binary-jigsaw) | [0.9628](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-toxicity-binary-jigsaw) | [0.9628](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-toxicity-binary-jigsaw) | [0.9187](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-toxicity-binary-jigsaw) |
30
+ | [nvidia-aegis](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) | prompt-safety-binary | [Collection](https://huggingface.co/collections/enguard/prompt-safety-binary-nvidia-aegis) | [0.9092](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-safety-binary-nvidia-aegis) | [0.9293](https://huggingface.co/enguard/tiny-guard-4m-en-prompt-safety-binary-nvidia-aegis) | [0.8843](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-safety-binary-nvidia-aegis) |
31
+ | [nvidia-aegis](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) | response-safety-binary | [Collection](https://huggingface.co/collections/enguard/response-safety-binary-nvidia-aegis) | [0.8789](https://huggingface.co/enguard/tiny-guard-2m-en-response-safety-binary-nvidia-aegis) | [0.8955](https://huggingface.co/enguard/tiny-guard-4m-en-response-safety-binary-nvidia-aegis) | - |
32
+ | [polyguard](https://huggingface.co/datasets/ToxicityPrompts/PolyGuardMix) | prompt-safety-multilabel | [Collection](https://huggingface.co/collections/enguard/prompt-safety-multilabel-polyguard) | [0.8506](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-safety-multilabel-polyguard) | [0.8910](https://huggingface.co/enguard/tiny-guard-8m-en-prompt-safety-multilabel-polyguard) | [0.8796](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-safety-multilabel-polyguard) |
33
+ | [polyguard](https://huggingface.co/datasets/ToxicityPrompts/PolyGuardMix) | response-refusal-binary | [Collection](https://huggingface.co/collections/enguard/response-refusal-binary-polyguard) | [0.9650](https://huggingface.co/enguard/tiny-guard-2m-en-response-refusal-binary-polyguard) | [0.9723](https://huggingface.co/enguard/tiny-guard-8m-en-response-refusal-binary-polyguard) | [0.9496](https://huggingface.co/enguard/medium-guard-128m-xx-response-refusal-binary-polyguard) |
34
+ | [polyguard](https://huggingface.co/datasets/ToxicityPrompts/PolyGuardMix) | prompt-safety-binary | [Collection](https://huggingface.co/collections/enguard/prompt-safety-binary-polyguard) | [0.9728](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-safety-binary-polyguard) | [0.9788](https://huggingface.co/enguard/tiny-guard-8m-en-prompt-safety-binary-polyguard) | [0.9609](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-safety-binary-polyguard) |
35
+ | [polyguard](https://huggingface.co/datasets/ToxicityPrompts/PolyGuardMix) | response-safety-binary | [Collection](https://huggingface.co/collections/enguard/response-safety-binary-polyguard) | [0.9642](https://huggingface.co/enguard/tiny-guard-2m-en-response-safety-binary-polyguard) | [0.9747](https://huggingface.co/enguard/tiny-guard-8m-en-response-safety-binary-polyguard) | [0.9405](https://huggingface.co/enguard/medium-guard-128m-xx-response-safety-binary-polyguard) |
36
+ | [polyguard](https://huggingface.co/datasets/ToxicityPrompts/PolyGuardMix) | response-safety-multilabel | [Collection](https://huggingface.co/collections/enguard/response-safety-multilabel-polyguard) | [0.7956](https://huggingface.co/enguard/tiny-guard-2m-en-response-safety-multilabel-polyguard) | [0.8192](https://huggingface.co/enguard/tiny-guard-8m-en-response-safety-multilabel-polyguard) | [0.8028](https://huggingface.co/enguard/medium-guard-128m-xx-response-safety-multilabel-polyguard) |
37
+ | [sok](https://huggingface.co/datasets/youbin2014/JailbreakDB) | prompt-jailbreak-binary | [Collection](https://huggingface.co/collections/enguard/prompt-jailbreak-binary-sok) | [0.9896](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-jailbreak-binary-sok) | [0.9896](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-jailbreak-binary-sok) | [0.9890](https://huggingface.co/enguard/medium-guard-128m-xx-prompt-jailbreak-binary-sok) |
38
+ | [toxic-chat](https://huggingface.co/datasets/lmsys/toxic-chat) | prompt-toxicity | [Collection](https://huggingface.co/collections/enguard/prompt-toxicity-toxic-chat) | [0.6782](https://huggingface.co/enguard/tiny-guard-2m-en-prompt-toxicity-toxic-chat) | [0.7194](https://huggingface.co/enguard/tiny-guard-8m-en-prompt-toxicity-toxic-chat) | - |