eelixir
/

llama3-8b-thinking-v2

chain-of-thought

Model card Files Files and versions

llama3-8b-thinking-v2 / README.md

eelixir's picture

Update README.md

a5de9ed verified about 2 months ago

|

history blame contribute delete

1.94 kB

	---
	base_model: unsloth/llama-3-8b-Instruct-bnb-4bit
	library_name: peft
	tags:
	- llama-3
	- reasoning
	- chain-of-thought
	- unsloth
	- trl
	- logic
	license: llama3
	---

	# Llama 3 8B - Thinking V2

	This model is a specialized LoRA fine-tune of `Meta-Llama-3-8B-Instruct` designed to enforce Chain-of-Thought (CoT) reasoning before providing a final answer. By utilizing specialized `<thinking>` tags, the model pauses to break down logic puzzles, coding problems, and tricky phrasing (like riddles and state-tracking) before generating its response.

	## 🧠 Model Details
	* Base Model: Meta Llama 3 8B Instruct
	* Fine-Tuning Method: LoRA (Low-Rank Adaptation) via Unsloth
	* Dataset: 475 hand-curated logic, math, and reasoning puzzles.
	* Epochs: 3
	* Primary Goal: To force "System 2" thinking, reducing hallucinations and impulsive errors on complex prompts.

	## 🚀 How to Use

	CRITICAL: To trigger the reasoning engine, your prompt must be formatted to anticipate the `<thinking>` tag. If you do not prompt the model correctly, it may bypass the reasoning phase and act like a standard Llama 3 model.

	### Inference Code (Python / Transformers)

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "eelixir/llama3-8b-thinking-v2"
	model = AutoModelForCausalLM.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	test_question = "A farmer has 17 sheep. All but 9 run away. How many sheep are left?"

	# Note the intentional inclusion of <thinking>\n at the end to "prime" the reasoning!
	prompt = f"<\|begin_of_text\|><\|start_header_id\|>user<\|end_header_id\|>\n\n{test_question}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>\n\n<thinking>\n"

	inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
	outputs = model.generate(**inputs, max_new_tokens=1024, use_cache=True)

	print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])