Rimon-Math-3B-V1

Rimon-Math-3B-V1 is a specialized 3-billion-parameter causal language model, fine-tuned for high-accuracy mathematical reasoning and logical problem-solving. Built on the Llama-3.2-3B-Instruct architecture and optimized using the Unsloth framework, this model excels at generating structured, step-by-step solutions (Chain-of-Thought).

Highlights

  • Reasoning Focused: Trained specifically to break down complex problems into logical steps.
  • Lightweight & Efficient: Optimized for consumer-grade GPUs (T4, RTX 3060+) and edge deployment.
  • High Compatibility: Works seamlessly with transformers, vLLM, and supports GGUF conversion for local use.

Model Capabilities

The model is fine-tuned to handle various mathematical domains:

  • Algebra: Solving equations, inequalities, and system of equations.
  • Calculus: Derivatives, integrals, and limit problems.
  • Geometry & Trigonometry: Properties of shapes and trigonometric identities.
  • Logic & Arithmetic: Multi-step word problems and sequence analysis.

Training Metrics (Approximation)

Epoch Step Training Loss Validation Loss LR
1.0 1000 0.7104 0.6952 1.5e-4
2.0 2000 0.5911 0.5843 5.0e-5
3.0 3000 0.5244 0.5102 1.0e-5

Usage Guide

Installation & Dependencies

To run Rimon-Math-3B-V1 efficiently, ensure you have the latest versions of the following libraries installed. Run this command in your terminal or a notebook cell:

pip install -U transformers torch accelerate bitsandbytes sentencepiece
Component Minimum (4-bit) Recommended (16-bit)
GPU NVIDIA T4 / RTX 3050 (4GB VRAM) RTX 3060 / A100 (12GB+ VRAM)
RAM 8 GB System RAM 16 GB System RAM
CUDA 11.8 or higher 12.1 or higher

How to Use the Model

You can load the model in two different modes depending on your hardware resources.

Option 1: 4-bit Quantization (Low VRAM Mode)

Best for users on Google Colab (Free T4) or laptops with limited GPU memory. This uses only ~3.5 GB of VRAM.

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

model_id = "rimon-dutta/Rimon-Math-3B-V1"

# 4-bit Configuration for memory efficiency
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

Option 2: 16-bit Full Precision (High Accuracy Mode)

Best for users with 8GB+ VRAM (e.g., RTX 3060 12GB or higher). This provides the most precise mathematical reasoning.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "rimon-dutta/Rimon-Math-3B-V1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)

Running Inference (Example)

Once the model is loaded, you can solve math problems using the standard Llama 3.2 chat template.

# Define your math problem
messages = [
    {"role": "system", "content": "You are a specialized math tutor. Explain step-by-step."},
    {"role": "user", "content": "If x + 1/x = 3, find the value of x^5 + 1/x^5."}
]

# Apply the chat template
inputs = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True, 
    return_tensors="pt"
).to(model.device)

# Generate the response
outputs = model.generate(
    **inputs, 
    max_new_tokens=1024, 
    temperature=0.1, # Low temperature is crucial for math accuracy
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Troubleshooting Guide

  1. GPU Memory Error (OOM): If you get an "Out of Memory" error, restart your runtime and use Option 1 (4-bit).

  2. BitsAndBytes Issues: If load_in_4bit fails, ensure you are running on a Linux-based environment (or WSL2 on Windows) and that your bitsandbytes is up to date:

pip install -U bitsandbytes
  1. CUDA Mismatch: If you encounter a runtime error regarding CUDA versions, reinstall PyTorch with the correct index URL:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Prompt Engineering Tips

Use a system prompt to control reasoning style Keep temperature between 0.1 – 0.3 for math tasks Always request step-by-step explanation Avoid ambiguous wording in problems

Author

Rimon Dutta
DevOps Engineer | AI & ML Learner
Kotwali, Bangladesh

Downloads last month
839
Safetensors
Model size
3B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train rimon-dutta/Rimon-Math-3B-V1