Model Card for thuanan/Llama-3.2-1B-Instruct-mathqa-lora

LoRA adapter for math instruction following, fine-tuned from Llama 3.2 1B Instruct 4-bit.

Model Details

Model Description

This model is a PEFT/LoRA adapter trained for math problem solving style responses with step-by-step reasoning and concise final answers. It was trained using Unsloth + TRL SFT workflow and pushed to the Hugging Face Hub.

  • Developed by: ThuanNaN / project contributors
  • Funded by [optional]: [More Information Needed]
  • Shared by [optional]: thuanan
  • Model type: Causal language model adapter (LoRA) for instruction-following generation
  • Language(s) (NLP): English
  • License: [More Information Needed]
  • Finetuned from model [optional]: unsloth/Llama-3.2-1B-Instruct-bnb-4bit

Model Sources [optional]

Uses

Direct Use

  • Math question answering in chat-style assistants
  • Educational reasoning-style responses for math instructions

Downstream Use [optional]

  • Can be mounted as an adapter in vLLM/Transformers serving stacks
  • Can be integrated into tutoring or evaluation workflows with output verification

Out-of-Scope Use

  • High-stakes decision-making where mathematically incorrect outputs can cause harm
  • Automated grading/assessment without human review
  • Domains requiring formal symbolic guarantees

Bias, Risks, and Limitations

  • The model can still produce arithmetic and reasoning errors.
  • The model may hallucinate invalid steps while sounding confident.
  • Training used only a subset of the full MathInstruct data.
  • As a 1B-base adapter, performance may degrade on complex multi-step tasks.

Recommendations

  • Verify final answers with deterministic tools or human review.
  • Use constrained decoding and post-checking for critical tasks.
  • Add guardrails for uncertainty disclosure in user-facing apps.

How to Get Started with the Model

Use the code below to get started with the model.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "unsloth/Llama-3.2-1B-Instruct-bnb-4bit"
adapter_id = "thuanan/Llama-3.2-1B-Instruct-mathqa-lora"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base_model = AutoModelForCausalLM.from_pretrained(base_model_id, torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

messages = [
    {
        "role": "system",
        "content": "You are a helpful math tutor. Solve the problem with clear reasoning and end with a concise final answer.",
    },
    {"role": "user", "content": "Solve: 2x + 5 = 17"},
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.inference_mode():
    output = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.2,
        top_p=0.9,
        repetition_penalty=1.1,
    )

generated = output[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated, skip_special_tokens=True))

Training Details

Training Data

  • Dataset: TIGER-Lab/MathInstruct
  • Split strategy: 100 held-out validation samples, then 3% sampled from remaining train split
  • Fields used: instruction, output

Training Procedure

Training used supervised fine-tuning (SFT) with chat-formatted prompts:

  • system: math tutor instruction
  • user: problem/instruction
  • assistant: reference solution

Preprocessing [optional]

  • Converted each sample into chat conversation text via tokenizer chat template
  • Tokenized with truncation and max sequence length of 2048

Training Hyperparameters

  • Training regime: bf16 mixed precision when supported, otherwise fp16 mixed precision
  • Max sequence length: 2048
  • Epochs: 5
  • Learning rate: 2e-4
  • Weight decay: 0.01
  • Warmup steps: 200
  • LR scheduler: cosine
  • Per-device train batch size: 8
  • Per-device eval batch size: 8
  • Gradient accumulation steps: 2
  • Optimizer: paged_adamw_8bit
  • Evaluation strategy: steps (every 100)
  • Checkpoint save strategy: steps (every 100), keep last 2
  • Early stopping: patience=2, threshold=0.0
  • LoRA rank: 16
  • LoRA alpha: 16
  • LoRA dropout: 0
  • Seed: 42

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

  • 100-sample validation holdout from TIGER-Lab/MathInstruct

Factors

  • General math instruction and solution generation prompts
  • Multi-step reasoning quality and answer correctness

Metrics

  • eval_loss during validation
  • Qualitative generation inspection on held-out examples

Results

  • Training tracked eval_loss and saved best model at end based on lowest eval_loss.
  • Additional manual spot-check generation was performed in notebook inference cells.

Summary

The adapter improves math instruction-following style and reasoning format for the target dataset subset, but outputs still require verification for correctness.

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: [More Information Needed]
  • Hours used: [More Information Needed]
  • Cloud Provider: [More Information Needed]
  • Compute Region: [More Information Needed]
  • Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

  • Base architecture: Llama 3.2 1B Instruct (4-bit quantized base checkpoint)
  • Adaptation method: LoRA on attention and MLP projection modules
  • Objective: next-token prediction under supervised instruction-following format

Compute Infrastructure

[More Information Needed]

Hardware

  • CUDA GPU expected for training (bf16 if supported)

Software

  • PyTorch 2.10.0+cu130
  • Unsloth
  • TRL
  • Transformers
  • Datasets
  • PEFT

Citation [optional]

BibTeX:

@misc{aio_llmops_mathqa_lora_2026,
  title={Llama-3.2-1B-Instruct-mathqa-lora},
  author={ThuanNaN and contributors},
  year={2026},
  howpublished={\url{https://huggingface.co/thuanan/Llama-3.2-1B-Instruct-mathqa-lora}}
}

APA:

ThuanNaN, & contributors. (2026). Llama-3.2-1B-Instruct-mathqa-lora. Hugging Face. https://huggingface.co/thuanan/Llama-3.2-1B-Instruct-mathqa-lora

Glossary [optional]

  • LoRA: Low-Rank Adaptation for parameter-efficient fine-tuning
  • SFT: Supervised Fine-Tuning
  • PEFT: Parameter-Efficient Fine-Tuning

More Information [optional]

The training workflow is documented in notebooks/math_qa.ipynb within the aio-llmops repository.

Model Card Authors [optional]

ThuanNaN / aio-llmops contributors

Model Card Contact

[More Information Needed]

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora

Dataset used to train VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora

Paper for VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora