Machlovi
/

GGuard

@@ -51,14 +51,10 @@ model.eval()
 ### Safety Policies Instructions
 ```bash
-policy_prompt = f"""Check if there is unsafe content in the user or agent messages in the
-below conversation in accordance with our safety policy with the below categories.
-Provide a safe or unsafe 'rating' for both the user message and the agent response.
-If there is unsafe content, please also provide a list of violated categories according to our safety
-policy below
 <BEGIN UNSAFE CONTENT CATEGORIES>
-S1: Violence.
 S2: Sexual.
 S3: Criminal Planning/Confessions.
 S4: Guns and Illegal Weapons.
@@ -71,7 +67,7 @@ S10: Harassment.
 S11: Threat.
 S12: Profanity.
 S13: Needs Caution.
-S14: Other.
 S15: Manipulation.
 S16: Fraud/Deception.
 S17: Malware.
@@ -83,14 +79,7 @@ S22: Illegal Activity.
 S23: Immoral/Unethical
 S24: Other.
 <END UNSAFE CONTENT CATEGORIES>
-Provide your safety assessment for the
-below conversation in the specified JSON format.
-<BEGIN CONVERSATION>
-user: {user}
-response: {agent}
-<END CONVERSATION>
 Provide your safety assessment for the above conversation in the below output JSON format:
 "User Safety": safe or unsafe rating for the user message,
@@ -99,6 +88,7 @@ Provide your safety assessment for the above conversation in the below output JS
 Do not include anything other than the output JSON in your response.
 Output JSON:
 """
 ```
@@ -106,7 +96,7 @@ Output JSON:
 ```bash
 from unsloth.chat_templates import get_chat_template
-chat_template='llama-3'
 tokenizer = get_chat_template(
     tokenizer,
     chat_template = chat_template,
@@ -132,7 +122,7 @@ inputs = tokenizer.apply_chat_template(
 ```bash
 from transformers import TextStreamer
 text_streamer = TextStreamer(tokenizer)
-_= model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 10, use_cache = True)
 Hate speech, personal attacks, and discrimination

 ### Safety Policies Instructions
 ```bash
+policy = """You are a safety moderation assistant. Check if there is unsafe content in the user message according to the following categories:
 <BEGIN UNSAFE CONTENT CATEGORIES>
+S1: Violence
 S2: Sexual.
 S3: Criminal Planning/Confessions.
 S4: Guns and Illegal Weapons.
 S11: Threat.
 S12: Profanity.
 S13: Needs Caution.
+S14: Immoral Medical.
 S15: Manipulation.
 S16: Fraud/Deception.
 S17: Malware.
 S23: Immoral/Unethical
 S24: Other.
 <END UNSAFE CONTENT CATEGORIES>
+Respond with only this JSON format:
 Provide your safety assessment for the above conversation in the below output JSON format:
 "User Safety": safe or unsafe rating for the user message,
 Do not include anything other than the output JSON in your response.
 Output JSON:
 """
 ```
 ```bash
 from unsloth.chat_templates import get_chat_template
+chat_template='gemma-3'
 tokenizer = get_chat_template(
     tokenizer,
     chat_template = chat_template,
 ```bash
 from transformers import TextStreamer
 text_streamer = TextStreamer(tokenizer)
+_= model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 50, use_cache = True, temperature = 0.2, top_p = 0.95, top_k = 64,)
 Hate speech, personal attacks, and discrimination