--- base_model: unsloth/llama-3-8b-Instruct-bnb-4bit library_name: peft tags: - llama-3 - reasoning - chain-of-thought - unsloth - trl - logic license: llama3 --- # Llama 3 8B - Thinking V2 This model is a specialized LoRA fine-tune of `Meta-Llama-3-8B-Instruct` designed to enforce **Chain-of-Thought (CoT) reasoning** before providing a final answer. By utilizing specialized `` tags, the model pauses to break down logic puzzles, coding problems, and tricky phrasing (like riddles and state-tracking) before generating its response. ## 🧠 Model Details * **Base Model:** Meta Llama 3 8B Instruct * **Fine-Tuning Method:** LoRA (Low-Rank Adaptation) via Unsloth * **Dataset:** 475 hand-curated logic, math, and reasoning puzzles. * **Epochs:** 3 * **Primary Goal:** To force "System 2" thinking, reducing hallucinations and impulsive errors on complex prompts. ## 🚀 How to Use **CRITICAL:** To trigger the reasoning engine, your prompt **must** be formatted to anticipate the `` tag. If you do not prompt the model correctly, it may bypass the reasoning phase and act like a standard Llama 3 model. ### Inference Code (Python / Transformers) ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "eelixir/llama3-8b-thinking-v2" model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) test_question = "A farmer has 17 sheep. All but 9 run away. How many sheep are left?" # Note the intentional inclusion of \n at the end to "prime" the reasoning! prompt = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{test_question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n\n" inputs = tokenizer([prompt], return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=1024, use_cache=True) print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])