| language: | |
| - en | |
| license: mit | |
| datasets: | |
| - fdyrd/MATH | |
| base_model: | |
| - Qwen/Qwen2.5-0.5B | |
| library_name: transformers | |
| tags: | |
| - text-generation-inference | |
| metrics: | |
| - accuracy | |
| # QwenMath | |
| A generation LLM which can solve math problems. | |
| ## Training Statistics | |
| ```yaml | |
| training-method: lora | |
| training-time: "5:42" | |
| data-size: 500 | |
| epoch: 3 | |
| total_flos: "1372250GF" | |
| train_loss: 0.6441 | |
| train_samples_per_second: 4.385 | |
| train_steps_per_second: 0.544 | |
| ``` | |
| ## Validation Set Performance | |
| Dataset used: test split of [fdyrd/MATH](https://huggingface.co/datasets/fdyrd/MATH). | |
| Metric: accuracy | |
| <table> | |
| <tr> | |
| <th> Level </th> | |
| <th> Algebra </th> | |
| <th> Intermediate Algebra </th> | |
| <th> Prealgebra </th> | |
| <th> Precalculus </th> | |
| <th> Number Theory </th> | |
| <th> Geometry </th> | |
| <th> Counting & Probability </th> | |
| <th> Average </th> | |
| </tr> | |
| <tr> | |
| <td> Level 1 </td> | |
| <td> 0.541 : 135 </td> | |
| <td> 0.192 : 52 </td> | |
| <td> 0.477 : 86 </td> | |
| <td> 0.228 : 57 </td> | |
| <td> 0.467 : 30 </td> | |
| <td> 0.263 : 38 </td> | |
| <td> 0.359 : 39 </td> | |
| <td> 0.361 </td> | |
| </tr> | |
| <tr> | |
| <td> Level 2 </td> | |
| <td> 0.323 : 201 </td> | |
| <td> 0.109 : 128 </td> | |
| <td> 0.367 : 177 </td> | |
| <td> 0.044 : 113 </td> | |
| <td> 0.38 : 92 </td> | |
| <td> 0.134 : 82 </td> | |
| <td> 0.248 : 101 </td> | |
| <td> 0.229 </td> | |
| </tr> | |
| <tr> | |
| <td> Level 3 </td> | |
| <td> 0.291 : 261 </td> | |
| <td> 0.046 : 195 </td> | |
| <td> 0.308 : 224 </td> | |
| <td> 0.0 : 127 </td> | |
| <td> 0.262 : 122 </td> | |
| <td> 0.088 : 102 </td> | |
| <td> 0.16 : 100 </td> | |
| <td> 0.165 </td> | |
| </tr> | |
| <tr> | |
| <td> Level 4 </td> | |
| <td> 0.18 : 283 </td> | |
| <td> 0.024 : 248 </td> | |
| <td> 0.22 : 191 </td> | |
| <td> 0.009 : 114 </td> | |
| <td> 0.169 : 142 </td> | |
| <td> 0.064 : 125 </td> | |
| <td> 0.09 : 111 </td> | |
| <td> 0.108 </td> | |
| </tr> | |
| <tr> | |
| <td> Level 5 </td> | |
| <td> 0.088 : 307 </td> | |
| <td> 0.004 : 280 </td> | |
| <td> 0.104 : 193 </td> | |
| <td> 0.0 : 135 </td> | |
| <td> 0.136 : 154 </td> | |
| <td> 0.023 : 132 </td> | |
| <td> 0.065 : 123 </td> | |
| <td> 0.06 </td> | |
| </tr> | |
| <tr> | |
| <td> Average </td> | |
| <td> 0.285 </td> | |
| <td> 0.075 </td> | |
| <td> 0.295 </td> | |
| <td> 0.056 </td> | |
| <td> 0.283 </td> | |
| <td> 0.114 </td> | |
| <td> 0.184 </td> | |
| <td> 0.166 </td> | |
| </tr> | |
| </table> | |
| ## Test Set Performance | |
| ```json | |
| [ | |
| { | |
| "dataset": "MATH500", | |
| "url": "https://huggingface.co/datasets/qq8933/MATH500", | |
| "accuracy": 0.286 | |
| }, | |
| { | |
| "dataset": "GSM8K", | |
| "url": "https://huggingface.co/datasets/openai/gsm8k", | |
| "accuracy": 0.382 | |
| } | |
| ] | |
| ``` |