| --- |
| library_name: transformers |
| tags: |
| - unsloth |
| - llama |
| - llama-3.2 |
| - text-generation |
| - reasoning |
| - chain-of-thought |
| - loRA |
| base_model: unsloth/Llama-3.2-3B-Instruct |
| datasets: |
| - ServiceNow-AI/R1-Distill-SFT |
| license: llama3.2 |
| language: |
| - en |
| --- |
| |
| # Model Card for FinetunedLAMAtoR1-001-3B |
|
|
| ## Model Details |
| ## Technical Specifications |
|
|
| ### Model Architecture and Objective |
| - **Base Model:** Llama-3.2-3B-Instruct |
| - **Architecture:** Causal Decoder-Only Transformer |
| - **Hidden Size:** 3072 |
| - **Layers:** 28 |
| - **Heads:** 24 |
| - **Parameters:** ~3.21B (Loaded in 4-bit quantization) |
| - **Precision:** Float16 (during inference/training via LoRA) |
|
|
| ### Compute Infrastructure |
| - **Hardware:** Tesla T4 GPU (Google Colab) |
| - **VRAM Usage:** ~2.24 GB (Model) + Training Overhead |
| - **Quantization:** 4-bit (QLoRA) via `bitsandbytes` |
|
|
| ### Model Weights |
| - **Type:** LoRA Adapter (Peft) |
| - **Adapter File Size:** ~92 MB |
| - **Total Saved Size:** ~108 MB |
| ### Model Description |
|
|
| This model is a fine-tuned version of **[unsloth/Llama-3.2-3B-Instruct](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct)** designed to mimic reflective, human-like stream-of-consciousness reasoning. It was trained using **[Unsloth](https://github.com/unslothai/unsloth)** on the **[ServiceNow-AI/R1-Distill-SFT](https://huggingface.co/datasets/ServiceNow-AI/R1-Distill-SFT)** dataset. |
|
|
| The model utilizes a specific system prompt to trigger a "thinking" process (Chain of Thought) before providing the final answer, aiming to replicate the reasoning capabilities seen in models like DeepSeek-R1. |
|
|
| - **Developed by:** Muhammad Shaheer Khan |
| - **Model type:** Causal Language Model (LoRA Fine-tune) |
| - **Language(s) (NLP):** English |
| - **License:** Llama 3.2 Community License |
| - **Finetuned from model:** unsloth/Llama-3.2-3B-Instruct |
|
|
| ## Uses |
|
|
| ### Direct Use |
|
|
| The model is intended for reasoning tasks where explainability and step-by-step logic are required. It excels at math problems, logic puzzles, and complex queries requiring iterative thought. |
|
|
| **System Prompt:** |
| To activate the reasoning capabilities, you must use the following system prompt: |
| > "You are a reflective assistant engaging in thorough, iterative reasoning, mimicking human stream-of-consciousness thinking. Your approach emphasizes exploration, self-doubt, and continuous refinement before coming up with an answer." |
|
|
| ## How to Get Started with the Model |
|
|
| You can use the model with the `unsloth` library for 2x faster inference, or standard Hugging Face `transformers`. |
|
|
| ### Using Unsloth (Recommended) |
|
|
| ```python |
| from unsloth import FastLanguageModel |
| from unsloth.chat_templates import get_chat_template |
| |
| model, tokenizer = FastLanguageModel.from_pretrained( |
| model_name = "Muhammad-Shaheer/FinetunedLAMAtoR1-001-3B", |
| max_seq_length = 2048, |
| dtype = None, |
| load_in_4bit = True, |
| ) |
| |
| # Enable native 2x faster inference |
| FastLanguageModel.for_inference(model) |
| |
| tokenizer = get_chat_template( |
| tokenizer, |
| chat_template = "llama-3.1", |
| ) |
| |
| sys_prompt = """You are a reflective assistant engaging in thorough, iterative reasoning, mimicking human stream-of-consciousness thinking. Your approach emphasizes exploration, self-doubt, and continuous refinement before coming up with an answer. |
| <problem> |
| {} |
| </problem> |
| """ |
| |
| message = sys_prompt.format("If there are a dozen of eggs at cost $60, how much one egg cost?") |
| |
| messages = [{"role": "user", "content": message}] |
| |
| inputs = tokenizer.apply_chat_template( |
| messages, |
| tokenize = True, |
| add_generation_prompt = True, |
| return_tensors = "pt", |
| ).to("cuda") |
| |
| outputs = model.generate( |
| input_ids = inputs, |
| max_new_tokens = 1024, |
| use_cache = True, |
| temperature = 1.5, |
| min_p = 0.1 |
| ) |
| print(tokenizer.batch_decode(outputs)) |