Machlovi
/

SafeLlama

text-generation-inference

Model card Files Files and versions

Machlovi commited on Mar 19, 2025

Commit

42a6121

·

verified ·

1 Parent(s): c50650b

Update README.md

Files changed (1) hide show

README.md +62 -0

README.md CHANGED Viewed

@@ -9,6 +9,8 @@ tags:
 license: apache-2.0
 language:
 - en
 ---
 # Uploaded  model
@@ -20,3 +22,63 @@ language:
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 license: apache-2.0
 language:
 - en
+datasets:
+- Machlovi/Hatebase
 ---
 # Uploaded  model
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
+### **📝 Load and Run Inference**
+```bash
+from unsloth import FastLanguageModel
+from peft import PeftModel
+import torch
+# Load the base model
+base_model_name = ""unsloth/Meta-Llama-3.1-8B-Instruct",
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name=base_model_name,
+    max_seq_length=2048,  # Must match fine-tuning
+    load_in_4bit=True,
+)
+# Load the fine-tuned LoRA adapter
+lora_model_name = "Machlovi/Llama3.8_HateBaseInst"
+model = PeftModel.from_pretrained(model, lora_model_name)
+# Run inference
+input_text = "Why do we need to go to see something?"
+inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
+with torch.no_grad():
+    outputs = model.generate(**inputs, max_new_tokens=4)
+# Decode and print response
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+```
+### **📝Inference with TextStreamer**
+```bash
+ {'content': 'why even let a tranny in they arent really even human',
+  'role': 'user'},
+]
+tokenizer = get_chat_template(
+    tokenizer,
+    chat_template = "llama-3", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
+    # mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
+)
+inputs = tokenizer.apply_chat_template(
+    messages,
+    tokenize = True,
+    add_generation_prompt = True, # Must add for generation
+    return_tensors = "pt",
+).to("cuda")
+from transformers import TextStreamer
+text_streamer = TextStreamer(tokenizer)
+_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 10, use_cache = True)
+```