Machlovi commited on
Commit
42a6121
·
verified ·
1 Parent(s): c50650b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md CHANGED
@@ -9,6 +9,8 @@ tags:
9
  license: apache-2.0
10
  language:
11
  - en
 
 
12
  ---
13
 
14
  # Uploaded model
@@ -20,3 +22,63 @@ language:
20
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  license: apache-2.0
10
  language:
11
  - en
12
+ datasets:
13
+ - Machlovi/Hatebase
14
  ---
15
 
16
  # Uploaded model
 
22
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
23
 
24
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
25
+
26
+
27
+ ### **📝 Load and Run Inference**
28
+ ```bash
29
+
30
+ from unsloth import FastLanguageModel
31
+ from peft import PeftModel
32
+ import torch
33
+
34
+ # Load the base model
35
+ base_model_name = ""unsloth/Meta-Llama-3.1-8B-Instruct",
36
+ model, tokenizer = FastLanguageModel.from_pretrained(
37
+ model_name=base_model_name,
38
+ max_seq_length=2048, # Must match fine-tuning
39
+ load_in_4bit=True,
40
+ )
41
+
42
+ # Load the fine-tuned LoRA adapter
43
+ lora_model_name = "Machlovi/Llama3.8_HateBaseInst"
44
+ model = PeftModel.from_pretrained(model, lora_model_name)
45
+
46
+ # Run inference
47
+ input_text = "Why do we need to go to see something?"
48
+ inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
49
+
50
+ with torch.no_grad():
51
+ outputs = model.generate(**inputs, max_new_tokens=4)
52
+
53
+ # Decode and print response
54
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
55
+
56
+
57
+
58
+ ```
59
+
60
+ ### **📝Inference with TextStreamer**
61
+ ```bash
62
+ {'content': 'why even let a tranny in they arent really even human',
63
+ 'role': 'user'},
64
+ ]
65
+
66
+ tokenizer = get_chat_template(
67
+ tokenizer,
68
+ chat_template = "llama-3", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
69
+ # mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
70
+ )
71
+
72
+
73
+
74
+ inputs = tokenizer.apply_chat_template(
75
+ messages,
76
+ tokenize = True,
77
+ add_generation_prompt = True, # Must add for generation
78
+ return_tensors = "pt",
79
+ ).to("cuda")
80
+
81
+ from transformers import TextStreamer
82
+ text_streamer = TextStreamer(tokenizer)
83
+ _ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 10, use_cache = True)
84
+ ```