--- base_model: unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit tags: - text-generation-inference - transformers - unsloth - llama - trl license: apache-2.0 language: - en datasets: - GeneralReasoning/GeneralThought-430K - isaiahbjork/cot-logic-reasoning --- # Uploaded model - **Developed by:** alibidaran - **License:** apache-2.0 - **Finetuned from model :** unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit - **Finedtuned with SFT Algorithm** ## Direct Usages: ``` python from transformers import TextStreamer from unsloth import FastLanguageModel import torch max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally! dtype = 'Bfloat16' # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+ load_in_4bit = True model, tokenizer = FastLanguageModel.from_pretrained( model_name ="alibidaran/LLAMA3-instructive_reasoning", max_seq_length = max_seq_length, #dtype = dtype, load_in_4bit = load_in_4bit, #fast_inference = True, # Enable vLLM fast inference max_lora_rank = 128, gpu_memory_utilization = 0.6, # Reduce if out of memory # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf ) FastLanguageModel.for_inference(model) # Enable native 2x faster inference system_prompt=""" You are a reasonable expert who thinks and answer the users question. Before respond first think and create a chain of thoughts in your mind. Then respond to the client. Your chain of thought and reflection must be in .. format and your respond should be in the .. format. """ messages = [ {'role':'system','content':system_prompt}, {"role": "user", "content":'How many r has the word of strawberry?' }, ] inputs = tokenizer.apply_chat_template( messages, tokenize = True, add_generation_prompt = True, # Must add for generation return_tensors = "pt", ).to("cuda") text_streamer = TextStreamer(tokenizer, skip_prompt = True) _ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens =2048, use_cache = True, temperature = 0.7, min_p = 0.9) ``` This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)