Instructions to use eelixir/llama3-8b-thinking-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use eelixir/llama3-8b-thinking-v2 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/llama-3-8b-instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "eelixir/llama3-8b-thinking-v2") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use eelixir/llama3-8b-thinking-v2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for eelixir/llama3-8b-thinking-v2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for eelixir/llama3-8b-thinking-v2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for eelixir/llama3-8b-thinking-v2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="eelixir/llama3-8b-thinking-v2", max_seq_length=2048, )
How to use from
Unsloth StudioInstall Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for eelixir/llama3-8b-thinking-v2 to start chattingUsing HuggingFace Spaces for Unsloth
# No setup required# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for eelixir/llama3-8b-thinking-v2 to start chattingLoad model with FastModel
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="eelixir/llama3-8b-thinking-v2",
max_seq_length=2048,
)Quick Links
Llama 3 8B - Thinking V2
This model is a specialized LoRA fine-tune of Meta-Llama-3-8B-Instruct designed to enforce Chain-of-Thought (CoT) reasoning before providing a final answer. By utilizing specialized <thinking> tags, the model pauses to break down logic puzzles, coding problems, and tricky phrasing (like riddles and state-tracking) before generating its response.
π§ Model Details
- Base Model: Meta Llama 3 8B Instruct
- Fine-Tuning Method: LoRA (Low-Rank Adaptation) via Unsloth
- Dataset: 475 hand-curated logic, math, and reasoning puzzles.
- Epochs: 3
- Primary Goal: To force "System 2" thinking, reducing hallucinations and impulsive errors on complex prompts.
π How to Use
CRITICAL: To trigger the reasoning engine, your prompt must be formatted to anticipate the <thinking> tag. If you do not prompt the model correctly, it may bypass the reasoning phase and act like a standard Llama 3 model.
Inference Code (Python / Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "eelixir/llama3-8b-thinking-v2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
test_question = "A farmer has 17 sheep. All but 9 run away. How many sheep are left?"
# Note the intentional inclusion of <thinking>\n at the end to "prime" the reasoning!
prompt = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{test_question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n<thinking>\n"
inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=1024, use_cache=True)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for eelixir/llama3-8b-thinking-v2
Base model
unsloth/llama-3-8b-Instruct-bnb-4bit
Install Unsloth Studio (macOS, Linux, WSL)
# Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for eelixir/llama3-8b-thinking-v2 to start chatting