Llama-3.2-3B-Math-LoRA (Reasoning Model)

This model is a fine-tuned version of Llama-3.2-3B-Instruct, specialized in solving mathematical problems using a reflective, iterative reasoning process. It was trained using Unsloth to mimic human-like stream-of-consciousness thinking, similar to the DeepSeek-R1 approach.

Key Features

  • Reflective Thinking: The model explores the problem, expresses self-doubt, and refines its logic before providing a final answer.
  • Efficient Fine-tuning: Trained using LoRA (Low-Rank Adaptation) in 4-bit quantization.
  • Math Specialist: Optimized with the OpenR1-Math-220k dataset to handle algebraic and arithmetic logic.

Training Metrics

The model was trained for 60 steps on a single NVIDIA T4 GPU (Google Colab).

Metric Value
Training Loss (Final) ~0.85 (Check your Colab logs)
Learning Rate 2e-4
Optimizer AdamW 8-bit
Batch Size 1 (with 8 gradient accumulation steps)
Precision 4-bit Quantization

How to Use (Inference Code)

Since this is a LoRA adapter, you can load it using the following code. Make sure you have unsloth installed.

Installation & Dependencies (CMD)

To run this model, you must have a GPU environment (like Google Colab or a local GPU). Install the necessary dependencies using the following commands:

!pip install -q huggingface_hub
!pip install -q unsloth
!pip install -q --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
from unsloth import FastLanguageModel
import torch

# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "rimon-dutta/Llama-3.2-3B-Math-LoRA",
    max_seq_length = 2048,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

# Reasoning System Prompt
r1_prompt = """You are a reflective assistant engaging in thorough, iterative reasoning.
<problem>
{}
</problem>
"""

# Test Question
problem_text = "Find all real values of x that satisfy the equation: 2^(x+3) + 2^x = 72."
messages = [{"role": "user", "content": r1_prompt.format(problem_text)}]


inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")


outputs = model.generate(input_ids=inputs, max_new_tokens=1024, use_cache=False)

print(tokenizer.batch_decode(outputs)[0])
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rimon-dutta/Llama-3.2-3B-Math-LoRA

Adapter
(405)
this model

Dataset used to train rimon-dutta/Llama-3.2-3B-Math-LoRA