hbseong
/

HarmAug-Guard

Text Classification

Model card Files Files and versions

hbseong commited on Oct 14, 2024

Commit

dda1e93

·

verified ·

1 Parent(s): 00e3207

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ Seanie Lee*, Haebin Seong*, Dong Bok Lee, Minki Kang, Xiaoyin Chen, Dominik Wagn
 Our model functions as a Guard Model, intended to classify the safety of conversations with LLMs and protect against LLM jailbreak attacks.
 It is fine-tuned from DeBERTa-v3-large and trained using **HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models**.
-The training process involves knowledge distillation paired with data augmentation, using our [**HarmAug Generated Dataset**].
 For more information, please refer to our [github](https://github.com/imnotkind/HarmAug)

 Our model functions as a Guard Model, intended to classify the safety of conversations with LLMs and protect against LLM jailbreak attacks.
 It is fine-tuned from DeBERTa-v3-large and trained using **HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models**.
+The training process involves knowledge distillation paired with data augmentation, using our [**HarmAug Generated Dataset**](https://drive.google.com/drive/folders/1oLUMPauXYtEBP7rvbULXL4hHp9Ck_yqg).
 For more information, please refer to our [github](https://github.com/imnotkind/HarmAug)