Update README.md
Browse filesAdded AIME evaluation
README.md
CHANGED
|
@@ -43,5 +43,10 @@ print(tokenizer.decode(output[0], skip_special_tokens=False))
|
|
| 43 |
```
|
| 44 |
|
| 45 |
### Evaluation
|
|
|
|
| 46 |
The model was evaluated on a randomly sampled subset of 1,000 records from the test split of the [Math-QA](https://huggingface.co/datasets/rvv-karma/Math-QA) dataset.
|
| 47 |
-
Math Genius 7B achieved an accuracy of 93.1% in producing the correct final answer under the pass@1 evaluation metric.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
```
|
| 44 |
|
| 45 |
### Evaluation
|
| 46 |
+
#### MathQA
|
| 47 |
The model was evaluated on a randomly sampled subset of 1,000 records from the test split of the [Math-QA](https://huggingface.co/datasets/rvv-karma/Math-QA) dataset.
|
| 48 |
+
Math Genius 7B achieved an accuracy of 93.1% in producing the correct final answer under the pass@1 evaluation metric.
|
| 49 |
+
|
| 50 |
+
#### AIME
|
| 51 |
+
Math Genious 7B was evaluated on [90 problems from AIME 22, AIME 23, and AIME 24](https://huggingface.co/datasets/AI-MO/aimo-validation-aime).
|
| 52 |
+
The model has successfully solved 3/90 of the problems.
|