hbseong
/

HarmAug-Guard

Text Classification

Model card Files Files and versions

hbseong commited on Oct 11, 2024

Commit

f2e8429

·

verified ·

1 Parent(s): 8f8c7e2

Update README.md

Files changed (1) hide show

README.md +11 -11

README.md CHANGED Viewed

@@ -1,17 +1,17 @@
----
-tags:
-- deberta-v3
-- deberta
-- deberta-v2
-license: mit
-base_model:
-- microsoft/deberta-v3-large
-pipeline_tag: text-classification
----
 # HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
 Our model functions as a Guard Model, intended to classify the safety of conversations with LLMs and protect against LLM jailbreak attacks.
 It is fine-tuned from DeBERTa-v3-large and trained using **HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models**.

+---
+tags:
+- deberta-v3
+- deberta
+- deberta-v2
+license: mit
+base_model:
+- microsoft/deberta-v3-large
+pipeline_tag: text-classification
+---
 # HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
+[Arxiv Link](https://arxiv.org/abs/2410.01524)
 Our model functions as a Guard Model, intended to classify the safety of conversations with LLMs and protect against LLM jailbreak attacks.
 It is fine-tuned from DeBERTa-v3-large and trained using **HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models**.