renll commited on
Commit
068f2c0
·
verified ·
1 Parent(s): 452fd5f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -239,5 +239,5 @@ We include a brief word on methodology here - and in particular, how we think ab
239
  Benchmark datasets
240
  We evaluate the model with three of the most popular math benchmarks where the strongest reasoning models are competing together. Specifically:
241
  + Math-500: This benchmark consists of 500 challenging math problems designed to test the model's ability to perform complex mathematical reasoning and problem-solving.
242
- + AIME 2024/AIME 2025: The American Invitational Mathematics Examination (AIME) is a highly regarded math competition that features a series of difficult problems aimed at assessing advanced mathematical skills and logical reasoning. We evaluate the models on the problems from both the year 2024 and the latest year 2025 examinations.
243
  + GPQA Diamond: The Graduate-Level Google-Proof Q&A (GPQA) Diamond benchmark focuses on evaluating the model's ability to understand and solve a wide range of mathematical questions, including both straightforward calculations and more intricate problem-solving tasks.
 
239
  Benchmark datasets
240
  We evaluate the model with three of the most popular math benchmarks where the strongest reasoning models are competing together. Specifically:
241
  + Math-500: This benchmark consists of 500 challenging math problems designed to test the model's ability to perform complex mathematical reasoning and problem-solving.
242
+ + AIME 2024/AIME 2025: The American Invitational Mathematics Examination (AIME) is a highly regarded math competition that features a series of difficult problems aimed at assessing advanced mathematical skills and logical reasoning. We evaluate the models on the problems from both 2024 and the year 2025 examinations.
243
  + GPQA Diamond: The Graduate-Level Google-Proof Q&A (GPQA) Diamond benchmark focuses on evaluating the model's ability to understand and solve a wide range of mathematical questions, including both straightforward calculations and more intricate problem-solving tasks.