File size: 3,314 Bytes
d7bb283
3174c40
d7bb283
3174c40
 
d7bb283
3174c40
 
 
 
 
 
 
d7bb283
 
3174c40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f33435c
3174c40
 
 
 
 
 
 
f33435c
3174c40
 
 
 
 
 
 
 
 
502a151
3174c40
 
502a151
3174c40
 
 
 
 
 
 
 
 
 
 
 
502a151
3174c40
 
 
 
 
 
 
 
 
 
 
502a151
3174c40
 
 
 
 
 
 
 
502a151
3174c40
 
 
 
 
 
 
502a151
3174c40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
502a151
3174c40
 
 
 
d7bb283
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
base_model: unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
license: apache-2.0
language:
- en
datasets:
- Machlovi/Hatebase
---

# Uploaded  model

- **Developed by:** Machlovi
- **License:** apache-2.0
- **Finetuned from model :** unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)


### **📝 Load **  
```bash
from peft import PeftModel
import torch

from unsloth import FastModel


# Load the base model
base_model_name = "unsloth/gemma-3-12b-it-unsloth-bnb-4bit",",
model, tokenizer = FastModel.from_pretrained(
    model_name=base_model_name,
    max_seq_length=2048,  # Must match fine-tuning
    load_in_4bit=True,
)

# Load the fine-tuned LoRA adapter
lora_model_name = "Machlovi/GGuard"
model = PeftModel.from_pretrained(model, lora_model_name)

model.eval()
```



### Safety Policies Instructions
```bash
policy = """You are a safety moderation assistant. Check if there is unsafe content in the user message according to the following categories:

<BEGIN UNSAFE CONTENT CATEGORIES>
S1: Violence
S2: Sexual.
S3: Criminal Planning/Confessions.
S4: Guns and Illegal Weapons.
S5: Controlled/Regulated Substances.
S6: Suicide and Self Harm.
S7: Sexual (minor).
S8: Hate/Identity Hate.
S9: PII/Privacy.
S10: Harassment.
S11: Threat.
S12: Profanity.
S13: Needs Caution.
S14: Immoral Medical.
S15: Manipulation.
S16: Fraud/Deception.
S17: Malware.
S18: High Risk Gov Decision Making.
S19: Political/Misinformation/Conspiracy.
S20: Copyright/Trademark/Plagiarism.
S21: Unauthorized Advice.
S22: Illegal Activity.
S23: Immoral/Unethical
S24: Other.
<END UNSAFE CONTENT CATEGORIES>
Respond with only this JSON format:

Provide your safety assessment for the above conversation in the below output JSON format:
"User Safety": safe or unsafe rating for the user message,
"Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present.
"Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.

Do not include anything other than the output JSON in your response.
Output JSON:

"""
```

### **Chat template 
```bash
from unsloth.chat_templates import get_chat_template

chat_template='gemma-3'
tokenizer = get_chat_template(
    tokenizer,
    chat_template = chat_template,
)

messages=[{"content":policy,
           'role':"system"},
 {'content':'why even let a tranny in they arent really even human',
  'role': 'user'},

]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

```


### **📝Inference with TextStreamer**  
```bash
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_= model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 50, use_cache = True, temperature = 0.2, top_p = 0.95, top_k = 64,)


Hate speech, personal attacks, and discrimination
```