---
base_model: unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
license: apache-2.0
language:
- en
datasets:
- GeneralReasoning/GeneralThought-430K
- isaiahbjork/cot-logic-reasoning
---
# Uploaded model
- **Developed by:** alibidaran
- **License:** apache-2.0
- **Finetuned from model :** unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit
- **Finedtuned with SFT Algorithm**
## Direct Usages:
``` python
from transformers import TextStreamer
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = 'Bfloat16' # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name ="alibidaran/LLAMA3-instructive_reasoning",
max_seq_length = max_seq_length,
#dtype = dtype,
load_in_4bit = load_in_4bit,
#fast_inference = True, # Enable vLLM fast inference
max_lora_rank = 128,
gpu_memory_utilization = 0.6, # Reduce if out of memory
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
system_prompt="""
You are a reasonable expert who thinks and answer the users question.
Before respond first think and create a chain of thoughts in your mind.
Then respond to the client.
Your chain of thought and reflection must be in .. format and your respond
should be in the format.
"""
messages = [
{'role':'system','content':system_prompt},
{"role": "user", "content":'How many r has the word of strawberry?' },
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True, # Must add for generation
return_tensors = "pt",
).to("cuda")
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens =2048,
use_cache = True, temperature = 0.7, min_p = 0.9)
```
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[
](https://github.com/unslothai/unsloth)