entfane commited on
Commit
625742c
·
verified ·
1 Parent(s): 786de95

Update README.md

Browse files

Added AIME evaluation

Files changed (1) hide show
  1. README.md +6 -1
README.md CHANGED
@@ -43,5 +43,10 @@ print(tokenizer.decode(output[0], skip_special_tokens=False))
43
  ```
44
 
45
  ### Evaluation
 
46
  The model was evaluated on a randomly sampled subset of 1,000 records from the test split of the [Math-QA](https://huggingface.co/datasets/rvv-karma/Math-QA) dataset.
47
- Math Genius 7B achieved an accuracy of 93.1% in producing the correct final answer under the pass@1 evaluation metric.
 
 
 
 
 
43
  ```
44
 
45
  ### Evaluation
46
+ #### MathQA
47
  The model was evaluated on a randomly sampled subset of 1,000 records from the test split of the [Math-QA](https://huggingface.co/datasets/rvv-karma/Math-QA) dataset.
48
+ Math Genius 7B achieved an accuracy of 93.1% in producing the correct final answer under the pass@1 evaluation metric.
49
+
50
+ #### AIME
51
+ Math Genious 7B was evaluated on [90 problems from AIME 22, AIME 23, and AIME 24](https://huggingface.co/datasets/AI-MO/aimo-validation-aime).
52
+ The model has successfully solved 3/90 of the problems.