|
|
---
|
|
|
language:
|
|
|
- zho
|
|
|
- eng
|
|
|
- fra
|
|
|
- spa
|
|
|
- por
|
|
|
- deu
|
|
|
- ita
|
|
|
- rus
|
|
|
- jpn
|
|
|
- kor
|
|
|
- vie
|
|
|
- tha
|
|
|
- ara
|
|
|
license: mit
|
|
|
datasets:
|
|
|
- fdyrd/MATH
|
|
|
base_model:
|
|
|
- Qwen/Qwen2.5-0.5B
|
|
|
library_name: transformers
|
|
|
tags:
|
|
|
- text-generation-inference
|
|
|
metrics:
|
|
|
- accuracy
|
|
|
---
|
|
|
|
|
|
# QwenMath
|
|
|
|
|
|
A generation LLM which can solve math problems.
|
|
|
|
|
|
## Training Statistics
|
|
|
```yaml
|
|
|
training-method: lora
|
|
|
training-time: "5:42"
|
|
|
data-size: 500
|
|
|
epoch: 3
|
|
|
total_flos: "1372250GF"
|
|
|
train_loss: 0.6441
|
|
|
train_samples_per_second: 4.385
|
|
|
train_steps_per_second: 0.544
|
|
|
```
|
|
|
|
|
|
## Validation Set Performance
|
|
|
Dataset used: test split of [fdyrd/MATH](https://huggingface.co/datasets/fdyrd/MATH).
|
|
|
Metric: accuracy
|
|
|
|
|
|
<table>
|
|
|
<tr>
|
|
|
<th> Level </th>
|
|
|
<th> Algebra </th>
|
|
|
<th> Intermediate Algebra </th>
|
|
|
<th> Prealgebra </th>
|
|
|
<th> Precalculus </th>
|
|
|
<th> Number Theory </th>
|
|
|
<th> Geometry </th>
|
|
|
<th> Counting & Probability </th>
|
|
|
<th> Average </th>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td> Level 1 </td>
|
|
|
<td> 0.541 : 135 </td>
|
|
|
<td> 0.192 : 52 </td>
|
|
|
<td> 0.477 : 86 </td>
|
|
|
<td> 0.228 : 57 </td>
|
|
|
<td> 0.467 : 30 </td>
|
|
|
<td> 0.263 : 38 </td>
|
|
|
<td> 0.359 : 39 </td>
|
|
|
<td> 0.361 </td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td> Level 2 </td>
|
|
|
<td> 0.323 : 201 </td>
|
|
|
<td> 0.109 : 128 </td>
|
|
|
<td> 0.367 : 177 </td>
|
|
|
<td> 0.044 : 113 </td>
|
|
|
<td> 0.38 : 92 </td>
|
|
|
<td> 0.134 : 82 </td>
|
|
|
<td> 0.248 : 101 </td>
|
|
|
<td> 0.229 </td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td> Level 3 </td>
|
|
|
<td> 0.291 : 261 </td>
|
|
|
<td> 0.046 : 195 </td>
|
|
|
<td> 0.308 : 224 </td>
|
|
|
<td> 0.0 : 127 </td>
|
|
|
<td> 0.262 : 122 </td>
|
|
|
<td> 0.088 : 102 </td>
|
|
|
<td> 0.16 : 100 </td>
|
|
|
<td> 0.165 </td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td> Level 4 </td>
|
|
|
<td> 0.18 : 283 </td>
|
|
|
<td> 0.024 : 248 </td>
|
|
|
<td> 0.22 : 191 </td>
|
|
|
<td> 0.009 : 114 </td>
|
|
|
<td> 0.169 : 142 </td>
|
|
|
<td> 0.064 : 125 </td>
|
|
|
<td> 0.09 : 111 </td>
|
|
|
<td> 0.108 </td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td> Level 5 </td>
|
|
|
<td> 0.088 : 307 </td>
|
|
|
<td> 0.004 : 280 </td>
|
|
|
<td> 0.104 : 193 </td>
|
|
|
<td> 0.0 : 135 </td>
|
|
|
<td> 0.136 : 154 </td>
|
|
|
<td> 0.023 : 132 </td>
|
|
|
<td> 0.065 : 123 </td>
|
|
|
<td> 0.06 </td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td> Average </td>
|
|
|
<td> 0.285 </td>
|
|
|
<td> 0.075 </td>
|
|
|
<td> 0.295 </td>
|
|
|
<td> 0.056 </td>
|
|
|
<td> 0.283 </td>
|
|
|
<td> 0.114 </td>
|
|
|
<td> 0.184 </td>
|
|
|
<td> 0.166 </td>
|
|
|
</tr>
|
|
|
</table>
|
|
|
|
|
|
## Test Set Performance
|
|
|
|
|
|
```json
|
|
|
[
|
|
|
{
|
|
|
"dataset": "MATH500",
|
|
|
"url": "https://huggingface.co/datasets/qq8933/MATH500",
|
|
|
"accuracy": 0.286
|
|
|
},
|
|
|
{
|
|
|
"dataset": "GSM8K",
|
|
|
"url": "https://huggingface.co/datasets/openai/gsm8k",
|
|
|
"accuracy": 0.382
|
|
|
}
|
|
|
]
|
|
|
``` |