Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,138 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model:
|
| 6 |
+
- Qwen/Qwen2.5-Math-1.5B
|
| 7 |
+
pipeline_tag: question-answering
|
| 8 |
+
library_name: transformers
|
| 9 |
+
tags:
|
| 10 |
+
- math
|
| 11 |
+
- qwen
|
| 12 |
+
- gsm8k
|
| 13 |
+
- lora
|
| 14 |
+
---
|
| 15 |
+
# OpenMath
|
| 16 |
+
Fine-tuning a Small Language Model (SLM) for Step-by-Step Math Reasoning
|
| 17 |
+
|
| 18 |
+
## Overview
|
| 19 |
+
**OpenMath** is an open-source project focused on fine-tuning a **small language model (SLM)** to solve **math word problems** with clear, step-by-step reasoning.
|
| 20 |
+
The project uses **LoRA/QLoRA fine-tuning** on popular math reasoning datasets and provides a benchmarking pipeline to compare performance against other open-source SLMs/LLMs.
|
| 21 |
+
|
| 22 |
+
This project is designed to be reproducible on **free Colab (T4) GPU**.
|
| 23 |
+
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
## What’s Included
|
| 27 |
+
- QLoRA fine-tuning code (4-bit)
|
| 28 |
+
- GSM8K subset training (example: 1k samples)
|
| 29 |
+
- GSM8K evaluation script (accuracy)
|
| 30 |
+
- Saved LoRA adapter weights
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## Base Model
|
| 35 |
+
- **Qwen2.5-Math-1.5B**
|
| 36 |
+
|
| 37 |
+
---
|
| 38 |
+
|
| 39 |
+
## Dataset
|
| 40 |
+
- **GSM8K** (Grade School Math 8K)
|
| 41 |
+
- Training used: **1000 samples**
|
| 42 |
+
- Evaluation: GSM8K test split
|
| 43 |
+
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
## Results
|
| 47 |
+
### Training Setup (Current)
|
| 48 |
+
- Samples: 1000
|
| 49 |
+
- Epochs: 6
|
| 50 |
+
- Max length: 1024
|
| 51 |
+
- LoRA rank: 16
|
| 52 |
+
- Loss masking: trained mainly on the **solution portion** to improve reasoning
|
| 53 |
+
|
| 54 |
+
### Accuracy
|
| 55 |
+
- **GSM8K Accuracy (100-sample test subset): 41%**
|
| 56 |
+
|
| 57 |
+
> Note: The 41% score was measured on a **100-question subset** of the GSM8K test set for faster evaluation on Colab.
|
| 58 |
+
|
| 59 |
+
---
|
| 60 |
+
|
| 61 |
+
## GSM8K Leaderboard (Baseline)
|
| 62 |
+
|
| 63 |
+
| Model | Params | GSM8K Accuracy (%) |
|
| 64 |
+
|------|--------|---------------------|
|
| 65 |
+
| LLaMA 2 | 13B | 28.7 |
|
| 66 |
+
| Gemma 2 (PT) | 2B | 23.9 |
|
| 67 |
+
| Mistral (Base) | 7B | 36.5 |
|
| 68 |
+
| ERNIE 4.5 | 21B | 25.2 |
|
| 69 |
+
| Baichuan (Base) | 13B | 26.6 |
|
| 70 |
+
| Gemma | 7B | 46.4 |
|
| 71 |
+
| Zephyr-7b-gemma-v0.1 | 7B | 45.56 |
|
| 72 |
+
| LLaMA 3.2 Instruct (COT) | 1B | 39.04 |
|
| 73 |
+
| Gemma 3 IT | 1B | 42.15 |
|
| 74 |
+
| Qwen 3 (Instruct mode) | 1.7B | 33.66 |
|
| 75 |
+
| **OpenMath (Qwen2.5-Math-1.5B + LoRA)** | **1.5B** | **41.0** |
|
| 76 |
+
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
|
| 80 |
+
<img width="1090" height="590" alt="image" src="https://github.com/user-attachments/assets/662ea336-8946-4542-b2f2-eb78712d5a2d" />
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
---
|
| 87 |
+
|
| 88 |
+
## Repository Files
|
| 89 |
+
### LoRA Adapter Folder
|
| 90 |
+
This project provides the fine-tuned adapter weights:
|
| 91 |
+
|
| 92 |
+
- `adapter_model.safetensors` → LoRA weights
|
| 93 |
+
- `adapter_config.json` → LoRA configuration
|
| 94 |
+
- Tokenizer + template files for correct formatting
|
| 95 |
+
|
| 96 |
+
> Note: This is **not a full model**.
|
| 97 |
+
> You must load the **base model** and then attach the adapter.
|
| 98 |
+
|
| 99 |
+
---
|
| 100 |
+
|
| 101 |
+
## Disclaimer
|
| 102 |
+
OpenMath is an educational/research project.
|
| 103 |
+
The fine-tuned model may produce incorrect, incomplete, or misleading answers.
|
| 104 |
+
Always verify solutions independently before using them for exams, assignments, or real-world decisions.
|
| 105 |
+
|
| 106 |
+
This project does **not** guarantee correctness and should not be used as a substitute for professional advice.
|
| 107 |
+
|
| 108 |
+
---
|
| 109 |
+
|
| 110 |
+
## Contributing
|
| 111 |
+
Contributions are welcome! 🎉
|
| 112 |
+
|
| 113 |
+
If you’d like to contribute:
|
| 114 |
+
1. Fork the repository
|
| 115 |
+
2. Create a new branch (`feature/your-feature-name`)
|
| 116 |
+
3. Commit your changes
|
| 117 |
+
4. Open a Pull Request
|
| 118 |
+
|
| 119 |
+
### Contribution Ideas
|
| 120 |
+
- Run full GSM8K test evaluation (1319 samples) and report results
|
| 121 |
+
- Train on larger GSM8K subsets (3k–5k samples)
|
| 122 |
+
- Add SVAMP / ASDiv datasets for better generalization
|
| 123 |
+
- Improve decoding to reduce repetition
|
| 124 |
+
- Add a Streamlit demo for interactive testing
|
| 125 |
+
- Benchmark against more open-source SLMs/LLMs
|
| 126 |
+
- Improve evaluation scripts and metrics
|
| 127 |
+
|
| 128 |
+
---
|
| 129 |
+
|
| 130 |
+
## Note
|
| 131 |
+
OpenMath is a **fun and practical side project** built to explore **efficient fine-tuning (QLoRA)** and **math reasoning evaluation** on limited compute.
|
| 132 |
+
The goal is to learn, experiment, and share reproducible results — while keeping the code clean and open for community improvements.
|
| 133 |
+
|
| 134 |
+
---
|
| 135 |
+
|
| 136 |
+
## License
|
| 137 |
+
This project is licensed under the **Apache License 2.0**.
|
| 138 |
+
See the [LICENSE](LICENSE) file for details.
|