Upload README.md
#2
by
Seanie-lee
- opened
README.md
CHANGED
|
@@ -16,7 +16,7 @@ library_name: transformers
|
|
| 16 |
|
| 17 |
Our model functions as a Guard Model, intended to classify the safety of conversations with LLMs and protect against LLM jailbreak attacks.
|
| 18 |
It is fine-tuned from DeBERTa-v3-large and trained using **HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models**.
|
| 19 |
-
The training process involves knowledge distillation paired with data augmentation, using our [**HarmAug Generated Dataset**]
|
| 20 |
|
| 21 |
|
| 22 |
For more information, please refer to our [github](https://github.com/imnotkind/HarmAug)
|
|
@@ -44,7 +44,7 @@ model.eval()
|
|
| 44 |
# If response is not given, the model will predict the unsafe score of the prompt.
|
| 45 |
# If response is given, the model will predict the unsafe score of the response.
|
| 46 |
def predict(model, prompt, response=None):
|
| 47 |
-
device = model.device
|
| 48 |
if response == None:
|
| 49 |
inputs = tokenizer(prompt, return_tensors="pt")
|
| 50 |
else:
|
|
|
|
| 16 |
|
| 17 |
Our model functions as a Guard Model, intended to classify the safety of conversations with LLMs and protect against LLM jailbreak attacks.
|
| 18 |
It is fine-tuned from DeBERTa-v3-large and trained using **HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models**.
|
| 19 |
+
The training process involves knowledge distillation paired with data augmentation, using our [**HarmAug Generated Dataset**].
|
| 20 |
|
| 21 |
|
| 22 |
For more information, please refer to our [github](https://github.com/imnotkind/HarmAug)
|
|
|
|
| 44 |
# If response is not given, the model will predict the unsafe score of the prompt.
|
| 45 |
# If response is given, the model will predict the unsafe score of the response.
|
| 46 |
def predict(model, prompt, response=None):
|
| 47 |
+
device = model.device
|
| 48 |
if response == None:
|
| 49 |
inputs = tokenizer(prompt, return_tensors="pt")
|
| 50 |
else:
|