---
base_model: unsloth/llama-3-8b-Instruct-bnb-4bit
library_name: peft
tags:
- llama-3
- reasoning
- chain-of-thought
- unsloth
- trl
- logic
license: llama3
---

# Llama 3 8B - Thinking V2

This model is a specialized LoRA fine-tune of `Meta-Llama-3-8B-Instruct` designed to enforce **Chain-of-Thought (CoT) reasoning** before providing a final answer. By utilizing specialized `<thinking>` tags, the model pauses to break down logic puzzles, coding problems, and tricky phrasing (like riddles and state-tracking) before generating its response.

## 🧠 Model Details
* **Base Model:** Meta Llama 3 8B Instruct
* **Fine-Tuning Method:** LoRA (Low-Rank Adaptation) via Unsloth
* **Dataset:** 475 hand-curated logic, math, and reasoning puzzles.
* **Epochs:** 3
* **Primary Goal:** To force "System 2" thinking, reducing hallucinations and impulsive errors on complex prompts.

## 🚀 How to Use

**CRITICAL:** To trigger the reasoning engine, your prompt **must** be formatted to anticipate the `<thinking>` tag. If you do not prompt the model correctly, it may bypass the reasoning phase and act like a standard Llama 3 model.

### Inference Code (Python / Transformers)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "eelixir/llama3-8b-thinking-v2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

test_question = "A farmer has 17 sheep. All but 9 run away. How many sheep are left?"

# Note the intentional inclusion of <thinking>\n at the end to "prime" the reasoning!
prompt = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{test_question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n<thinking>\n"

inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=1024, use_cache=True)

print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])