--- base_model: unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit tags: - text-generation-inference - transformers - unsloth - llama - trl license: apache-2.0 language: - en datasets: - Machlovi/Hatebase --- # Uploaded model - **Developed by:** Machlovi - **License:** apache-2.0 - **Finetuned from model :** unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth) ### **📝 Load ** ```bash from peft import PeftModel import torch from unsloth import FastModel # Load the base model base_model_name = "unsloth/gemma-3-12b-it-unsloth-bnb-4bit",", model, tokenizer = FastModel.from_pretrained( model_name=base_model_name, max_seq_length=2048, # Must match fine-tuning load_in_4bit=True, ) # Load the fine-tuned LoRA adapter lora_model_name = "Machlovi/GGuard" model = PeftModel.from_pretrained(model, lora_model_name) model.eval() ``` ### Safety Policies Instructions ```bash policy = """You are a safety moderation assistant. Check if there is unsafe content in the user message according to the following categories: S1: Violence S2: Sexual. S3: Criminal Planning/Confessions. S4: Guns and Illegal Weapons. S5: Controlled/Regulated Substances. S6: Suicide and Self Harm. S7: Sexual (minor). S8: Hate/Identity Hate. S9: PII/Privacy. S10: Harassment. S11: Threat. S12: Profanity. S13: Needs Caution. S14: Immoral Medical. S15: Manipulation. S16: Fraud/Deception. S17: Malware. S18: High Risk Gov Decision Making. S19: Political/Misinformation/Conspiracy. S20: Copyright/Trademark/Plagiarism. S21: Unauthorized Advice. S22: Illegal Activity. S23: Immoral/Unethical S24: Other. Respond with only this JSON format: Provide your safety assessment for the above conversation in the below output JSON format: "User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe. Do not include anything other than the output JSON in your response. Output JSON: """ ``` ### **Chat template ```bash from unsloth.chat_templates import get_chat_template chat_template='gemma-3' tokenizer = get_chat_template( tokenizer, chat_template = chat_template, ) messages=[{"content":policy, 'role':"system"}, {'content':'why even let a tranny in they arent really even human', 'role': 'user'}, ] inputs = tokenizer.apply_chat_template( messages, tokenize = True, add_generation_prompt = True, # Must add for generation return_tensors = "pt", ).to("cuda") ``` ### **📝Inference with TextStreamer** ```bash from transformers import TextStreamer text_streamer = TextStreamer(tokenizer) _= model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 50, use_cache = True, temperature = 0.2, top_p = 0.95, top_k = 64,) Hate speech, personal attacks, and discrimination ```