| | --- |
| | license: cc-by-nc-4.0 |
| | datasets: |
| | - openai/gsm8k |
| | language: |
| | - en |
| | base_model: |
| | - Qwen/Qwen2.5-Math-1.5B |
| | pipeline_tag: text-generation |
| | library_name: transformers |
| | tags: |
| | - math |
| | - qwen |
| | - lora |
| | - mathematics |
| | - gsm8k |
| | --- |
| | |
| | # OpenMath |
| | Fine-tuning a Small Language Model (SLM) for Step-by-Step Math Reasoning |
| |
|
| | ## Overview |
| | OpenMath is an open-source project focused on fine-tuning a small language model for math reasoning using QLoRA (4-bit LoRA). |
| |
|
| | This repository contains only a LoRA adapter trained on GSM8K. Users must load the base model separately and attach the adapter. |
| |
|
| | The latest version of this model was trained on an AMD MI300X GPU using ROCm, showing that modern non-NVIDIA accelerators can successfully support large-scale fine-tuning with Hugging Face and PyTorch. |
| |
|
| | --- |
| |
|
| | ## Base Model |
| | Qwen/Qwen2.5-Math-1.5B |
| |
|
| | This repository does not contain the base model weights β they must be loaded from Hugging Face. |
| |
|
| | --- |
| |
|
| | ## Hardware Used (Latest Training Run) |
| |
|
| | GPU: AMD MI300X (ROCm 7.0) |
| | VRAM: 192 GB |
| | Operating System: Ubuntu 24.04 |
| | Framework: PyTorch + Hugging Face |
| | Backend: ROCm |
| |
|
| | --- |
| |
|
| | ## Dataset |
| |
|
| | GSM8K (Grade School Math 8K) |
| | Training samples: 1,000 |
| | Evaluation: Full GSM8K test split (1,319 problems) |
| |
|
| | Only the solution portion of each example was used for loss computation through loss masking. |
| |
|
| | --- |
| |
|
| | ## Training Configuration |
| |
|
| | Method: QLoRA (4-bit) |
| | Quantization: NF4 with float16 compute |
| | LoRA rank: 16 |
| | LoRA alpha: 32 |
| | LoRA dropout: 0.05 |
| | Target modules: q_proj, k_proj, v_proj, o_proj |
| | Max sequence length: 1024 |
| | Batch size: 1 |
| | Gradient accumulation: 16 |
| | Effective batch size: 16 |
| | Learning rate: 1e-4 |
| | Optimizer: paged_adamw_8bit |
| | Scheduler: cosine |
| | Warmup: 5 percent |
| | Epochs: 6 |
| |
|
| | --- |
| |
|
| | ## Results |
| |
|
| | GSM8K Accuracy (Full Test Set): |
| | 750 out of 1319 correct, which equals 56.86 percent accuracy. |
| |
|
| | This is significantly stronger than the earlier Colab T4 run and is a strong result for a 1.5B model trained with LoRA. |
| |
|
| | --- |
| |
|
| | ## What This Repository Contains |
| |
|
| | adapter_model.safetensors β LoRA weights |
| | adapter_config.json β LoRA configuration |
| | chat_template.jinja β chat formatting template |
| | tokenizer.json β tokenizer file |
| | tokenizer_config.json β tokenizer settings |
| | README.md β documentation |
| |
|
| | This repository does not include checkpoints, optimizer states, or full base model weights. |
| |
|
| | --- |
| |
|
| | ## How to Use This Model |
| |
|
| | Load the base model Qwen/Qwen2.5-Math-1.5B from Hugging Face, then attach this LoRA adapter using PEFT. Generate answers using a prompt that includes an instruction, problem, and solution section. |
| |
|
| | --- |
| |
|
| | ## Why This Matters |
| |
|
| | This project demonstrates that AMD MI300X can train modern language models with Hugging Face and QLoRA. |
| | It shows that high-quality math reasoning is possible at 1.5B parameters using efficient fine-tuning. |
| | It provides a lightweight adapter instead of requiring users to download a massive full model. |
| |
|
| | --- |
| |
|
| | ## Limitations |
| |
|
| | The model can make reasoning mistakes. |
| | It should not be used for exams, assignments, or professional decisions. |
| | Performance depends heavily on prompt formatting. |
| |
|
| | --- |
| |
|
| | ## Future Work |
| |
|
| | Train on 3,000 to 5,000 GSM8K samples. |
| | Add SVAMP and ASDiv datasets. |
| | Improve decoding to reduce repetition. |
| | Experiment with multi-GPU scaling on MI300X. |
| | Add a Streamlit demo for interactive use. |
| |
|
| | --- |
| |
|
| | ## License |
| |
|
| | cc-by-nc-4.0 |