Instructions to use VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora", dtype="auto") - PEFT
How to use VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora
- SGLang
How to use VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora", max_seq_length=2048, ) - Docker Model Runner
How to use VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora with Docker Model Runner:
docker model run hf.co/VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora
- Model Card for thuanan/Llama-3.2-1B-Instruct-mathqa-lora
- Model Details
- Uses
- Bias, Risks, and Limitations
- How to Get Started with the Model
- Training Details
- Evaluation
- Model Examination [optional]
- Environmental Impact
- Technical Specifications [optional]
- Citation [optional]
- Glossary [optional]
- More Information [optional]
- Model Card Authors [optional]
- Model Card Contact
Model Card for thuanan/Llama-3.2-1B-Instruct-mathqa-lora
LoRA adapter for math instruction following, fine-tuned from Llama 3.2 1B Instruct 4-bit.
Model Details
Model Description
This model is a PEFT/LoRA adapter trained for math problem solving style responses with step-by-step reasoning and concise final answers. It was trained using Unsloth + TRL SFT workflow and pushed to the Hugging Face Hub.
- Developed by: ThuanNaN / project contributors
- Funded by [optional]: [More Information Needed]
- Shared by [optional]: thuanan
- Model type: Causal language model adapter (LoRA) for instruction-following generation
- Language(s) (NLP): English
- License: [More Information Needed]
- Finetuned from model [optional]: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
Model Sources [optional]
- Repository: https://github.com/ThuanNaN/aio-llmops
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Uses
Direct Use
- Math question answering in chat-style assistants
- Educational reasoning-style responses for math instructions
Downstream Use [optional]
- Can be mounted as an adapter in vLLM/Transformers serving stacks
- Can be integrated into tutoring or evaluation workflows with output verification
Out-of-Scope Use
- High-stakes decision-making where mathematically incorrect outputs can cause harm
- Automated grading/assessment without human review
- Domains requiring formal symbolic guarantees
Bias, Risks, and Limitations
- The model can still produce arithmetic and reasoning errors.
- The model may hallucinate invalid steps while sounding confident.
- Training used only a subset of the full MathInstruct data.
- As a 1B-base adapter, performance may degrade on complex multi-step tasks.
Recommendations
- Verify final answers with deterministic tools or human review.
- Use constrained decoding and post-checking for critical tasks.
- Add guardrails for uncertainty disclosure in user-facing apps.
How to Get Started with the Model
Use the code below to get started with the model.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model_id = "unsloth/Llama-3.2-1B-Instruct-bnb-4bit"
adapter_id = "thuanan/Llama-3.2-1B-Instruct-mathqa-lora"
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
base_model = AutoModelForCausalLM.from_pretrained(base_model_id, torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()
messages = [
{
"role": "system",
"content": "You are a helpful math tutor. Solve the problem with clear reasoning and end with a concise final answer.",
},
{"role": "user", "content": "Solve: 2x + 5 = 17"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
output = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.2,
top_p=0.9,
repetition_penalty=1.1,
)
generated = output[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated, skip_special_tokens=True))
Training Details
Training Data
- Dataset: TIGER-Lab/MathInstruct
- Split strategy: 100 held-out validation samples, then 3% sampled from remaining train split
- Fields used: instruction, output
Training Procedure
Training used supervised fine-tuning (SFT) with chat-formatted prompts:
- system: math tutor instruction
- user: problem/instruction
- assistant: reference solution
Preprocessing [optional]
- Converted each sample into chat conversation text via tokenizer chat template
- Tokenized with truncation and max sequence length of 2048
Training Hyperparameters
- Training regime: bf16 mixed precision when supported, otherwise fp16 mixed precision
- Max sequence length: 2048
- Epochs: 5
- Learning rate: 2e-4
- Weight decay: 0.01
- Warmup steps: 200
- LR scheduler: cosine
- Per-device train batch size: 8
- Per-device eval batch size: 8
- Gradient accumulation steps: 2
- Optimizer: paged_adamw_8bit
- Evaluation strategy: steps (every 100)
- Checkpoint save strategy: steps (every 100), keep last 2
- Early stopping: patience=2, threshold=0.0
- LoRA rank: 16
- LoRA alpha: 16
- LoRA dropout: 0
- Seed: 42
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
- 100-sample validation holdout from TIGER-Lab/MathInstruct
Factors
- General math instruction and solution generation prompts
- Multi-step reasoning quality and answer correctness
Metrics
- eval_loss during validation
- Qualitative generation inspection on held-out examples
Results
- Training tracked eval_loss and saved best model at end based on lowest eval_loss.
- Additional manual spot-check generation was performed in notebook inference cells.
Summary
The adapter improves math instruction-following style and reasoning format for the target dataset subset, but outputs still require verification for correctness.
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
- Base architecture: Llama 3.2 1B Instruct (4-bit quantized base checkpoint)
- Adaptation method: LoRA on attention and MLP projection modules
- Objective: next-token prediction under supervised instruction-following format
Compute Infrastructure
[More Information Needed]
Hardware
- CUDA GPU expected for training (bf16 if supported)
Software
- PyTorch 2.10.0+cu130
- Unsloth
- TRL
- Transformers
- Datasets
- PEFT
Citation [optional]
BibTeX:
@misc{aio_llmops_mathqa_lora_2026,
title={Llama-3.2-1B-Instruct-mathqa-lora},
author={ThuanNaN and contributors},
year={2026},
howpublished={\url{https://huggingface.co/thuanan/Llama-3.2-1B-Instruct-mathqa-lora}}
}
APA:
ThuanNaN, & contributors. (2026). Llama-3.2-1B-Instruct-mathqa-lora. Hugging Face. https://huggingface.co/thuanan/Llama-3.2-1B-Instruct-mathqa-lora
Glossary [optional]
- LoRA: Low-Rank Adaptation for parameter-efficient fine-tuning
- SFT: Supervised Fine-Tuning
- PEFT: Parameter-Efficient Fine-Tuning
More Information [optional]
The training workflow is documented in notebooks/math_qa.ipynb within the aio-llmops repository.
Model Card Authors [optional]
ThuanNaN / aio-llmops contributors
Model Card Contact
[More Information Needed]
Model tree for VLAI-AIVN/Llama-3.2-1B-Instruct-mathqa-lora
Base model
meta-llama/Llama-3.2-1B-Instruct