Machlovi
/

GGuard

text-generation-inference

Model card Files Files and versions

GGuard / README.md

Machlovi's picture

Update README.md

f33435c verified about 23 hours ago

|

history blame contribute delete

3.31 kB

	---
	base_model: unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	license: apache-2.0
	language:
	- en
	datasets:
	- Machlovi/Hatebase
	---

	# Uploaded model

	- Developed by: Machlovi
	- License: apache-2.0
	- Finetuned from model : unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit

	This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)


	### 📝 Load
	```bash
	from peft import PeftModel
	import torch

	from unsloth import FastModel


	# Load the base model
	base_model_name = "unsloth/gemma-3-12b-it-unsloth-bnb-4bit",",
	model, tokenizer = FastModel.from_pretrained(
	model_name=base_model_name,
	max_seq_length=2048, # Must match fine-tuning
	load_in_4bit=True,
	)

	# Load the fine-tuned LoRA adapter
	lora_model_name = "Machlovi/GGuard"
	model = PeftModel.from_pretrained(model, lora_model_name)

	model.eval()
	```



	### Safety Policies Instructions
	```bash
	policy = """You are a safety moderation assistant. Check if there is unsafe content in the user message according to the following categories:

	<BEGIN UNSAFE CONTENT CATEGORIES>
	S1: Violence
	S2: Sexual.
	S3: Criminal Planning/Confessions.
	S4: Guns and Illegal Weapons.
	S5: Controlled/Regulated Substances.
	S6: Suicide and Self Harm.
	S7: Sexual (minor).
	S8: Hate/Identity Hate.
	S9: PII/Privacy.
	S10: Harassment.
	S11: Threat.
	S12: Profanity.
	S13: Needs Caution.
	S14: Immoral Medical.
	S15: Manipulation.
	S16: Fraud/Deception.
	S17: Malware.
	S18: High Risk Gov Decision Making.
	S19: Political/Misinformation/Conspiracy.
	S20: Copyright/Trademark/Plagiarism.
	S21: Unauthorized Advice.
	S22: Illegal Activity.
	S23: Immoral/Unethical
	S24: Other.
	<END UNSAFE CONTENT CATEGORIES>
	Respond with only this JSON format:

	Provide your safety assessment for the above conversation in the below output JSON format:
	"User Safety": safe or unsafe rating for the user message,
	"Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present.
	"Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.

	Do not include anything other than the output JSON in your response.
	Output JSON:

	"""
	```

	### **Chat template
	```bash
	from unsloth.chat_templates import get_chat_template

	chat_template='gemma-3'
	tokenizer = get_chat_template(
	tokenizer,
	chat_template = chat_template,
	)

	messages=[{"content":policy,
	'role':"system"},
	{'content':'why even let a tranny in they arent really even human',
	'role': 'user'},

	]
	inputs = tokenizer.apply_chat_template(
	messages,
	tokenize = True,
	add_generation_prompt = True, # Must add for generation
	return_tensors = "pt",
	).to("cuda")

	```


	### 📝Inference with TextStreamer
	```bash
	from transformers import TextStreamer
	text_streamer = TextStreamer(tokenizer)
	_= model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 50, use_cache = True, temperature = 0.2, top_p = 0.95, top_k = 64,)


	Hate speech, personal attacks, and discrimination
	```