Instructions to use VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora", device_map="auto")

PEFT
How to use VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora

SGLang

How to use VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora",
    max_seq_length=2048,
)

Docker Model Runner
How to use VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora with Docker Model Runner:
```
docker model run hf.co/VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora
```

Model Card for thuanan/Llama-3.2-1B-Instruct-mathqa-lora

LoRA adapter for math instruction following, fine-tuned from Llama 3.2 1B Instruct 4-bit.

Model Details

Model Description

This model is a PEFT/LoRA adapter trained for math problem solving style responses with step-by-step reasoning and concise final answers. It was trained using Unsloth + TRL SFT workflow and pushed to the Hugging Face Hub.

Developed by: ThuanNaN / project contributors
Funded by [optional]: [More Information Needed]
Shared by [optional]: thuanan
Model type: Causal language model adapter (LoRA) for instruction-following generation
Language(s) (NLP): English
License: [More Information Needed]
Finetuned from model [optional]: unsloth/Llama-3.2-1B-Instruct-bnb-4bit

Model Sources [optional]

Repository: https://github.com/ThuanNaN/aio-llmops
Paper [optional]: [More Information Needed]
Demo [optional]: [More Information Needed]

Uses

Direct Use

Math question answering in chat-style assistants
Educational reasoning-style responses for math instructions

Downstream Use [optional]

Can be mounted as an adapter in vLLM/Transformers serving stacks
Can be integrated into tutoring or evaluation workflows with output verification

Out-of-Scope Use

High-stakes decision-making where mathematically incorrect outputs can cause harm
Automated grading/assessment without human review
Domains requiring formal symbolic guarantees

Bias, Risks, and Limitations

The model can still produce arithmetic and reasoning errors.
The model may hallucinate invalid steps while sounding confident.
Training used only a subset of the full MathInstruct data.
As a 1B-base adapter, performance may degrade on complex multi-step tasks.

Recommendations

Verify final answers with deterministic tools or human review.
Use constrained decoding and post-checking for critical tasks.
Add guardrails for uncertainty disclosure in user-facing apps.

How to Get Started with the Model

Use the code below to get started with the model.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "unsloth/Llama-3.2-1B-Instruct-bnb-4bit"
adapter_id = "thuanan/Llama-3.2-1B-Instruct-mathqa-lora"

tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base_model = AutoModelForCausalLM.from_pretrained(base_model_id, torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

messages = [
    {
        "role": "system",
        "content": "You are a helpful math tutor. Solve the problem with clear reasoning and end with a concise final answer.",
    },
    {"role": "user", "content": "Solve: 2x + 5 = 17"},
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.inference_mode():
    output = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.2,
        top_p=0.9,
        repetition_penalty=1.1,
    )

generated = output[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated, skip_special_tokens=True))

Training Details

Training Data

Dataset: TIGER-Lab/MathInstruct
Split strategy: 100 held-out validation samples, then 3% sampled from remaining train split
Fields used: instruction, output

Training Procedure

Training used supervised fine-tuning (SFT) with chat-formatted prompts:

system: math tutor instruction
user: problem/instruction
assistant: reference solution

Preprocessing [optional]

Converted each sample into chat conversation text via tokenizer chat template
Tokenized with truncation and max sequence length of 2048

Training Hyperparameters

Training regime: bf16 mixed precision when supported, otherwise fp16 mixed precision
Max sequence length: 2048
Epochs: 5
Learning rate: 2e-4
Weight decay: 0.01
Warmup steps: 200
LR scheduler: cosine
Per-device train batch size: 8
Per-device eval batch size: 8
Gradient accumulation steps: 2
Optimizer: paged_adamw_8bit
Evaluation strategy: steps (every 100)
Checkpoint save strategy: steps (every 100), keep last 2
Early stopping: patience=2, threshold=0.0
LoRA rank: 16
LoRA alpha: 16
LoRA dropout: 0
Seed: 42

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

100-sample validation holdout from TIGER-Lab/MathInstruct

Factors

General math instruction and solution generation prompts
Multi-step reasoning quality and answer correctness

Metrics

eval_loss during validation
Qualitative generation inspection on held-out examples

Results

Training tracked eval_loss and saved best model at end based on lowest eval_loss.
Additional manual spot-check generation was performed in notebook inference cells.

Summary

The adapter improves math instruction-following style and reasoning format for the target dataset subset, but outputs still require verification for correctness.

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

Base architecture: Llama 3.2 1B Instruct (4-bit quantized base checkpoint)
Adaptation method: LoRA on attention and MLP projection modules
Objective: next-token prediction under supervised instruction-following format

Compute Infrastructure

[More Information Needed]

Hardware

CUDA GPU expected for training (bf16 if supported)

Software

PyTorch 2.10.0+cu130
Unsloth
TRL
Transformers
Datasets
PEFT

Citation [optional]

BibTeX:

@misc{aio_llmops_mathqa_lora_2026,
  title={Llama-3.2-1B-Instruct-mathqa-lora},
  author={ThuanNaN and contributors},
  year={2026},
  howpublished={\url{https://huggingface.co/thuanan/Llama-3.2-1B-Instruct-mathqa-lora}}
}

APA:

ThuanNaN, & contributors. (2026). Llama-3.2-1B-Instruct-mathqa-lora. Hugging Face. https://huggingface.co/thuanan/Llama-3.2-1B-Instruct-mathqa-lora

Glossary [optional]

LoRA: Low-Rank Adaptation for parameter-efficient fine-tuning
SFT: Supervised Fine-Tuning
PEFT: Parameter-Efficient Fine-Tuning

More Information [optional]

The training workflow is documented in notebooks/math_qa.ipynb within the aio-llmops repository.

Model Card Authors [optional]

ThuanNaN / aio-llmops contributors

Model Card Contact

[More Information Needed]

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora

Base model

meta-llama/Llama-3.2-1B-Instruct

Quantized

unsloth/Llama-3.2-1B-Instruct-bnb-4bit

Adapter

(42)

this model

Dataset used to train VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora

Paper for VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora

Quantifying the Carbon Emissions of Machine Learning

Paper • 1910.09700 • Published Oct 21, 2019 • 60