Llama-3.2-3B-Math-LoRA (Reasoning Model)
This model is a fine-tuned version of Llama-3.2-3B-Instruct, specialized in solving mathematical problems using a reflective, iterative reasoning process. It was trained using Unsloth to mimic human-like stream-of-consciousness thinking, similar to the DeepSeek-R1 approach.
Key Features
- Reflective Thinking: The model explores the problem, expresses self-doubt, and refines its logic before providing a final answer.
- Efficient Fine-tuning: Trained using LoRA (Low-Rank Adaptation) in 4-bit quantization.
- Math Specialist: Optimized with the
OpenR1-Math-220kdataset to handle algebraic and arithmetic logic.
Training Metrics
The model was trained for 60 steps on a single NVIDIA T4 GPU (Google Colab).
| Metric | Value |
|---|---|
| Training Loss (Final) | ~0.85 (Check your Colab logs) |
| Learning Rate | 2e-4 |
| Optimizer | AdamW 8-bit |
| Batch Size | 1 (with 8 gradient accumulation steps) |
| Precision | 4-bit Quantization |
How to Use (Inference Code)
Since this is a LoRA adapter, you can load it using the following code. Make sure you have unsloth installed.
Installation & Dependencies (CMD)
To run this model, you must have a GPU environment (like Google Colab or a local GPU). Install the necessary dependencies using the following commands:
!pip install -q huggingface_hub
!pip install -q unsloth
!pip install -q --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
from unsloth import FastLanguageModel
import torch
# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "rimon-dutta/Llama-3.2-3B-Math-LoRA",
max_seq_length = 2048,
load_in_4bit = True,
)
FastLanguageModel.for_inference(model)
# Reasoning System Prompt
r1_prompt = """You are a reflective assistant engaging in thorough, iterative reasoning.
<problem>
{}
</problem>
"""
# Test Question
problem_text = "Find all real values of x that satisfy the equation: 2^(x+3) + 2^x = 72."
messages = [{"role": "user", "content": r1_prompt.format(problem_text)}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(input_ids=inputs, max_new_tokens=1024, use_cache=False)
print(tokenizer.batch_decode(outputs)[0])
Model tree for rimon-dutta/Llama-3.2-3B-Math-LoRA
Base model
meta-llama/Llama-3.2-3B-Instruct Finetuned
unsloth/Llama-3.2-3B-Instruct