hbseong
/

HarmAug-Guard

Text Classification

text-embeddings-inference

Model card Files Files and versions

Upload README.md

#2

by Seanie-lee - opened Oct 14, 2024

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ library_name: transformers
 Our model functions as a Guard Model, intended to classify the safety of conversations with LLMs and protect against LLM jailbreak attacks.
 It is fine-tuned from DeBERTa-v3-large and trained using **HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models**.
-The training process involves knowledge distillation paired with data augmentation, using our [**HarmAug Generated Dataset**](https://drive.google.com/drive/folders/1oLUMPauXYtEBP7rvbULXL4hHp9Ck_yqg?usp=drive_link).
 For more information, please refer to our [github](https://github.com/imnotkind/HarmAug)
@@ -44,7 +44,7 @@ model.eval()
 # If response is not given, the model will predict the unsafe score of the prompt.
 # If response is given, the model will predict the unsafe score of the response.
 def predict(model, prompt, response=None):
-    device = model.device()
     if response == None:
         inputs = tokenizer(prompt, return_tensors="pt")
     else:

 Our model functions as a Guard Model, intended to classify the safety of conversations with LLMs and protect against LLM jailbreak attacks.
 It is fine-tuned from DeBERTa-v3-large and trained using **HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models**.
+The training process involves knowledge distillation paired with data augmentation, using our [**HarmAug Generated Dataset**].
 For more information, please refer to our [github](https://github.com/imnotkind/HarmAug)
 # If response is not given, the model will predict the unsafe score of the prompt.
 # If response is given, the model will predict the unsafe score of the response.
 def predict(model, prompt, response=None):
+    device = model.device
     if response == None:
         inputs = tokenizer(prompt, return_tensors="pt")
     else: