File size: 3,107 Bytes
6106b62 d7db17b 6106b62 d7db17b 6106b62 d7db17b 6106b62 7306b2b 6106b62 d7db17b 6106b62 d7db17b 6106b62 d7db17b 6106b62 d7db17b 6106b62 d7db17b 6106b62 d7db17b 6106b62 d7db17b 6106b62 d7db17b 6106b62 d7db17b 6106b62 d7db17b 6106b62 d7db17b 6106b62 d7db17b 6106b62 d7db17b 6106b62 d7db17b 6106b62 d7db17b 6106b62 d7db17b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
---
library_name: transformers
license: apache-2.0
datasets:
- Ashed00/combined_math_problems
- openai/gsm8k
- deepmind/aqua_rat
base_model:
- HuggingFaceTB/SmolLM2-135M
---
# SmolMath-135M
SmolMath is a full finetuned version of SmolLM2-135M parameter, trained to obtain the highest math accuracy, with least drop in other text benchmarks.
**Important**: All training codes are present in the [Github](https://github.com/Ashu-00/SmolMath/)
**Important**: Please refer to the [Blog](https://hackmd.io/@ashu-00/SmolMath) for methodology and Training details.
## Usage
```python
model_path = "Ashed00/SmolMath-135M" # Path where your fine-tuned model is saved
from transformers import pipeline
pipe = pipeline("text-generation", model=model_path)
question = "What is 2+2?"
prompt = "Question: " + question + "\nAnswer:"
output = pipe(
prompt,
max_length=100,
do_sample=False, # disable sampling for greedy decoding
)[0]["generated_text"]
```
## Evaluation and Performance
### Comparision with Base Model
| **Metrics** | **SmolLM2-135M-8k** | **SmolMath-135M** | **Δ (Change)** |
|-------------------|---------------------|--------------------|----------------|
| HellaSwag | 42.1 | 41.15 | −0.95 |
| PIQA | 68.4 | 63.55 | −4.85 |
| CommonsenseQA | 33.9 | 33.42 | −0.48 |
| TriviaQA | 4.1 | 0.0 | −4.10 |
| Winogrande | 51.3 | 51.78 | +0.48 |
| OpenBookQA | 34.6 | 30.80 | −3.80 |
| GSM8K (0-shot)* | 0.0 | 6.9 | +6.90 |
*This was evaluated using the lighteval script, which is favoured by the SmolLM2 creators in their evaluation and varies from the SmolMath prompt structure.
### Math Benchmarks
| Model | AddSub* (%) | MAWPS** (%) | GSM8K* (%) |
|----------------------|------------|-----------|-----------|
| apple/OpenELM-270M-Instruct | 2.14 | 2.83 | 2.05 |
| HuggingFaceTB/SmolLM2-135M-Instruct | 1.52 |4.04 | 0.45 |
| SmolMath-no GRPO (ours) | 9.64 | 7.47 | 6.22 |
| SmolMath (ours) | **12.05** | **8.31** | **7.51** |
*Evaluated only on the test set, not included in the training
**Evaluated on complete dataset, not included in the training
## Citation
Incase you want to use this model in your work, you can site us.
```
@misc{SmolMath,
title = {Building SmolMath: A Math Reasoning SLM Under 150M Parameters},
url = {https://hackmd.io/@ashu-00/SmolMath},
author = {ashu-00},
month = {July},
year = {2025}
}
```
|