Instructions to use eelixir/llama3-8b-thinking-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use eelixir/llama3-8b-thinking-v2 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/llama-3-8b-instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "eelixir/llama3-8b-thinking-v2") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use eelixir/llama3-8b-thinking-v2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for eelixir/llama3-8b-thinking-v2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for eelixir/llama3-8b-thinking-v2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for eelixir/llama3-8b-thinking-v2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="eelixir/llama3-8b-thinking-v2", max_seq_length=2048, )
File size: 1,940 Bytes
b9530d1 7e7aa6a b9530d1 7e7aa6a b9530d1 7e7aa6a b9530d1 7e7aa6a b9530d1 7e7aa6a b9530d1 7e7aa6a a5de9ed 7e7aa6a b9530d1 7e7aa6a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | ---
base_model: unsloth/llama-3-8b-Instruct-bnb-4bit
library_name: peft
tags:
- llama-3
- reasoning
- chain-of-thought
- unsloth
- trl
- logic
license: llama3
---
# Llama 3 8B - Thinking V2
This model is a specialized LoRA fine-tune of `Meta-Llama-3-8B-Instruct` designed to enforce **Chain-of-Thought (CoT) reasoning** before providing a final answer. By utilizing specialized `<thinking>` tags, the model pauses to break down logic puzzles, coding problems, and tricky phrasing (like riddles and state-tracking) before generating its response.
## 🧠 Model Details
* **Base Model:** Meta Llama 3 8B Instruct
* **Fine-Tuning Method:** LoRA (Low-Rank Adaptation) via Unsloth
* **Dataset:** 475 hand-curated logic, math, and reasoning puzzles.
* **Epochs:** 3
* **Primary Goal:** To force "System 2" thinking, reducing hallucinations and impulsive errors on complex prompts.
## 🚀 How to Use
**CRITICAL:** To trigger the reasoning engine, your prompt **must** be formatted to anticipate the `<thinking>` tag. If you do not prompt the model correctly, it may bypass the reasoning phase and act like a standard Llama 3 model.
### Inference Code (Python / Transformers)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "eelixir/llama3-8b-thinking-v2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
test_question = "A farmer has 17 sheep. All but 9 run away. How many sheep are left?"
# Note the intentional inclusion of <thinking>\n at the end to "prime" the reasoning!
prompt = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{test_question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n<thinking>\n"
inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=1024, use_cache=True)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]) |