GGuard

File size: 3,314 Bytes

---
base_model: unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
license: apache-2.0
language:
- en
datasets:
- Machlovi/Hatebase
---

# Uploaded  model

- **Developed by:** Machlovi
- **License:** apache-2.0
- **Finetuned from model :** unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)


### **📝 Load **  
```bash
from peft import PeftModel
import torch

from unsloth import FastModel


# Load the base model
base_model_name = "unsloth/gemma-3-12b-it-unsloth-bnb-4bit",",
model, tokenizer = FastModel.from_pretrained(
    model_name=base_model_name,
    max_seq_length=2048,  # Must match fine-tuning
    load_in_4bit=True,
)

# Load the fine-tuned LoRA adapter
lora_model_name = "Machlovi/GGuard"
model = PeftModel.from_pretrained(model, lora_model_name)

model.eval()
```



### Safety Policies Instructions
```bash
policy = """You are a safety moderation assistant. Check if there is unsafe content in the user message according to the following categories:

<BEGIN UNSAFE CONTENT CATEGORIES>
S1: Violence
S2: Sexual.
S3: Criminal Planning/Confessions.
S4: Guns and Illegal Weapons.
S5: Controlled/Regulated Substances.
S6: Suicide and Self Harm.
S7: Sexual (minor).
S8: Hate/Identity Hate.
S9: PII/Privacy.
S10: Harassment.
S11: Threat.
S12: Profanity.
S13: Needs Caution.
S14: Immoral Medical.
S15: Manipulation.
S16: Fraud/Deception.
S17: Malware.
S18: High Risk Gov Decision Making.
S19: Political/Misinformation/Conspiracy.
S20: Copyright/Trademark/Plagiarism.
S21: Unauthorized Advice.
S22: Illegal Activity.
S23: Immoral/Unethical
S24: Other.
<END UNSAFE CONTENT CATEGORIES>
Respond with only this JSON format:

Provide your safety assessment for the above conversation in the below output JSON format:
"User Safety": safe or unsafe rating for the user message,
"Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present.
"Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.

Do not include anything other than the output JSON in your response.
Output JSON:

"""
```

### **Chat template 
```bash
from unsloth.chat_templates import get_chat_template

chat_template='gemma-3'
tokenizer = get_chat_template(
    tokenizer,
    chat_template = chat_template,
)

messages=[{"content":policy,
           'role':"system"},
 {'content':'why even let a tranny in they arent really even human',
  'role': 'user'},

]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

```


### **📝Inference with TextStreamer**  
```bash
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_= model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 50, use_cache = True, temperature = 0.2, top_p = 0.95, top_k = 64,)


Hate speech, personal attacks, and discrimination
```