updated modelcard

155ebc2 verified 5 months ago

3.74 kB

	---
	library_name: transformers
	tags:
	- unsloth
	- llama
	- llama-3.2
	- text-generation
	- reasoning
	- chain-of-thought
	- loRA
	base_model: unsloth/Llama-3.2-3B-Instruct
	datasets:
	- ServiceNow-AI/R1-Distill-SFT
	license: llama3.2
	language:
	- en
	---

	# Model Card for FinetunedLAMAtoR1-001-3B

	## Model Details
	## Technical Specifications

	### Model Architecture and Objective
	- Base Model: Llama-3.2-3B-Instruct
	- Architecture: Causal Decoder-Only Transformer
	- Hidden Size: 3072
	- Layers: 28
	- Heads: 24
	- Parameters: ~3.21B (Loaded in 4-bit quantization)
	- Precision: Float16 (during inference/training via LoRA)

	### Compute Infrastructure
	- Hardware: Tesla T4 GPU (Google Colab)
	- VRAM Usage: ~2.24 GB (Model) + Training Overhead
	- Quantization: 4-bit (QLoRA) via `bitsandbytes`

	### Model Weights
	- Type: LoRA Adapter (Peft)
	- Adapter File Size: ~92 MB
	- Total Saved Size: ~108 MB
	### Model Description

	This model is a fine-tuned version of [unsloth/Llama-3.2-3B-Instruct](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct) designed to mimic reflective, human-like stream-of-consciousness reasoning. It was trained using [Unsloth](https://github.com/unslothai/unsloth) on the [ServiceNow-AI/R1-Distill-SFT](https://huggingface.co/datasets/ServiceNow-AI/R1-Distill-SFT) dataset.

	The model utilizes a specific system prompt to trigger a "thinking" process (Chain of Thought) before providing the final answer, aiming to replicate the reasoning capabilities seen in models like DeepSeek-R1.

	- Developed by: Muhammad Shaheer Khan
	- Model type: Causal Language Model (LoRA Fine-tune)
	- Language(s) (NLP): English
	- License: Llama 3.2 Community License
	- Finetuned from model: unsloth/Llama-3.2-3B-Instruct

	## Uses

	### Direct Use

	The model is intended for reasoning tasks where explainability and step-by-step logic are required. It excels at math problems, logic puzzles, and complex queries requiring iterative thought.

	System Prompt:
	To activate the reasoning capabilities, you must use the following system prompt:
	> "You are a reflective assistant engaging in thorough, iterative reasoning, mimicking human stream-of-consciousness thinking. Your approach emphasizes exploration, self-doubt, and continuous refinement before coming up with an answer."

	## How to Get Started with the Model

	You can use the model with the `unsloth` library for 2x faster inference, or standard Hugging Face `transformers`.

	### Using Unsloth (Recommended)

	```python
	from unsloth import FastLanguageModel
	from unsloth.chat_templates import get_chat_template

	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name = "Muhammad-Shaheer/FinetunedLAMAtoR1-001-3B",
	max_seq_length = 2048,
	dtype = None,
	load_in_4bit = True,
	)

	# Enable native 2x faster inference
	FastLanguageModel.for_inference(model)

	tokenizer = get_chat_template(
	tokenizer,
	chat_template = "llama-3.1",
	)

	sys_prompt = """You are a reflective assistant engaging in thorough, iterative reasoning, mimicking human stream-of-consciousness thinking. Your approach emphasizes exploration, self-doubt, and continuous refinement before coming up with an answer.
	<problem>
	{}
	</problem>
	"""

	message = sys_prompt.format("If there are a dozen of eggs at cost $60, how much one egg cost?")

	messages = [{"role": "user", "content": message}]

	inputs = tokenizer.apply_chat_template(
	messages,
	tokenize = True,
	add_generation_prompt = True,
	return_tensors = "pt",
	).to("cuda")

	outputs = model.generate(
	input_ids = inputs,
	max_new_tokens = 1024,
	use_cache = True,
	temperature = 1.5,
	min_p = 0.1
	)
	print(tokenizer.batch_decode(outputs))