entfane
/

math-genius-7B

Text Generation

text-generation-inference

Model card Files Files and versions

entfane commited on Jul 13, 2025

Commit

625742c

·

verified ·

1 Parent(s): 786de95

Update README.md

Added AIME evaluation

Files changed (1) hide show

README.md +6 -1

README.md CHANGED Viewed

@@ -43,5 +43,10 @@ print(tokenizer.decode(output[0], skip_special_tokens=False))
 ```
 ### Evaluation
 The model was evaluated on a randomly sampled subset of 1,000 records from the test split of the [Math-QA](https://huggingface.co/datasets/rvv-karma/Math-QA) dataset.
-Math Genius 7B achieved an accuracy of 93.1% in producing the correct final answer under the pass@1 evaluation metric.

 ```
 ### Evaluation
+#### MathQA
 The model was evaluated on a randomly sampled subset of 1,000 records from the test split of the [Math-QA](https://huggingface.co/datasets/rvv-karma/Math-QA) dataset.
+Math Genius 7B achieved an accuracy of 93.1% in producing the correct final answer under the pass@1 evaluation metric.
+#### AIME
+Math Genious 7B was evaluated on [90 problems from AIME 22, AIME 23, and AIME 24](https://huggingface.co/datasets/AI-MO/aimo-validation-aime).
+The model has successfully solved 3/90 of the problems.