alibidaran commited on
Commit
94a80eb
·
verified ·
1 Parent(s): 8a8c0d4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -0
README.md CHANGED
@@ -16,6 +16,48 @@ language:
16
  - **Developed by:** alibidaran
17
  - **License:** apache-2.0
18
  - **Finetuned from model :** unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
 
16
  - **Developed by:** alibidaran
17
  - **License:** apache-2.0
18
  - **Finetuned from model :** unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit
19
+ - **Finedtuned with SFT Algorithm**
20
+ ## Direct Usages:
21
+ from transformers import TextStreamer
22
+ from unsloth import FastLanguageModel
23
+ import torch
24
+ max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
25
+ dtype = 'Bfloat16' # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
26
+ load_in_4bit = True
27
+ model, tokenizer = FastLanguageModel.from_pretrained(
28
+ model_name ="alibidaran/LLAMA3-instructive_reasoning",
29
+ max_seq_length = max_seq_length,
30
+ #dtype = dtype,
31
+ load_in_4bit = load_in_4bit,
32
+ #fast_inference = True, # Enable vLLM fast inference
33
+ max_lora_rank = 128,
34
+ gpu_memory_utilization = 0.6, # Reduce if out of memory
35
+ # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
36
+ )
37
+ FastLanguageModel.for_inference(model) # Enable native 2x faster inference
38
+ system_prompt="""
39
+ You are a reasonable expert who thinks and answer the users question.
40
+ Before respond first think and create a chain of thoughts in your mind.
41
+ Then respond to the client.
42
+ Your chain of thought and reflection must be in <thinking>..</thinking> format and your respond
43
+ should be in the <output>..</output> format.
44
+ """
45
+
46
+ messages = [
47
+ {'role':'system','content':system_prompt},
48
+ {"role": "user", "content":'How many r has the word of strawberry?' },
49
+
50
+ ]
51
+ inputs = tokenizer.apply_chat_template(
52
+ messages,
53
+ tokenize = True,
54
+ add_generation_prompt = True, # Must add for generation
55
+ return_tensors = "pt",
56
+ ).to("cuda")
57
+
58
+ text_streamer = TextStreamer(tokenizer, skip_prompt = True)
59
+ _ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens =2048,
60
+ use_cache = True, temperature = 0.7, min_p = 0.9)
61
 
62
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
63