microsoft
/

Phi-4-mini-flash-reasoning

Text Generation

Model card Files Files and versions

renll commited on Jun 22, 2025

Commit

068f2c0

·

verified ·

1 Parent(s): 452fd5f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -239,5 +239,5 @@ We include a brief word on methodology here - and in particular, how we think ab
 Benchmark datasets
 We evaluate the model with three of the most popular math benchmarks where the strongest reasoning models are competing together. Specifically:
 + Math-500: This benchmark consists of 500 challenging math problems designed to test the model's ability to perform complex mathematical reasoning and problem-solving.
-+ AIME 2024/AIME 2025: The American Invitational Mathematics Examination (AIME) is a highly regarded math competition that features a series of difficult problems aimed at assessing advanced mathematical skills and logical reasoning. We evaluate the models on the problems from both the year 2024 and the latest year 2025 examinations.
 + GPQA Diamond: The Graduate-Level Google-Proof Q&A (GPQA) Diamond benchmark focuses on evaluating the model's ability to understand and solve a wide range of mathematical questions, including both straightforward calculations and more intricate problem-solving tasks.

 Benchmark datasets
 We evaluate the model with three of the most popular math benchmarks where the strongest reasoning models are competing together. Specifically:
 + Math-500: This benchmark consists of 500 challenging math problems designed to test the model's ability to perform complex mathematical reasoning and problem-solving.
++ AIME 2024/AIME 2025: The American Invitational Mathematics Examination (AIME) is a highly regarded math competition that features a series of difficult problems aimed at assessing advanced mathematical skills and logical reasoning. We evaluate the models on the problems from both 2024 and the year 2025 examinations.
 + GPQA Diamond: The Graduate-Level Google-Proof Q&A (GPQA) Diamond benchmark focuses on evaluating the model's ability to understand and solve a wide range of mathematical questions, including both straightforward calculations and more intricate problem-solving tasks.