Qatar Project
Collection
5 items
•
Updated
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
from peft import PeftModel
import torch
from unsloth import FastModel
# Load the base model
base_model_name = "unsloth/gemma-3-12b-it-unsloth-bnb-4bit",",
model, tokenizer = FastModel.from_pretrained(
model_name=base_model_name,
max_seq_length=2048, # Must match fine-tuning
load_in_4bit=True,
)
# Load the fine-tuned LoRA adapter
lora_model_name = "Machlovi/GGuard"
model = PeftModel.from_pretrained(model, lora_model_name)
model.eval()
policy = """You are a safety moderation assistant. Check if there is unsafe content in the user message according to the following categories:
<BEGIN UNSAFE CONTENT CATEGORIES>
S1: Violence
S2: Sexual.
S3: Criminal Planning/Confessions.
S4: Guns and Illegal Weapons.
S5: Controlled/Regulated Substances.
S6: Suicide and Self Harm.
S7: Sexual (minor).
S8: Hate/Identity Hate.
S9: PII/Privacy.
S10: Harassment.
S11: Threat.
S12: Profanity.
S13: Needs Caution.
S14: Immoral Medical.
S15: Manipulation.
S16: Fraud/Deception.
S17: Malware.
S18: High Risk Gov Decision Making.
S19: Political/Misinformation/Conspiracy.
S20: Copyright/Trademark/Plagiarism.
S21: Unauthorized Advice.
S22: Illegal Activity.
S23: Immoral/Unethical
S24: Other.
<END UNSAFE CONTENT CATEGORIES>
Respond with only this JSON format:
Provide your safety assessment for the above conversation in the below output JSON format:
"User Safety": safe or unsafe rating for the user message,
"Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present.
"Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.
Do not include anything other than the output JSON in your response.
Output JSON:
"""
from unsloth.chat_templates import get_chat_template
chat_template='gemma-3'
tokenizer = get_chat_template(
tokenizer,
chat_template = chat_template,
)
messages=[{"content":policy,
'role':"system"},
{'content':'why even let a tranny in they arent really even human',
'role': 'user'},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True, # Must add for generation
return_tensors = "pt",
).to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_= model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 50, use_cache = True, temperature = 0.2, top_p = 0.95, top_k = 64,)
Hate speech, personal attacks, and discrimination