Phu Nguyen
commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -36,19 +36,19 @@ Sampling Configs:
|
|
| 36 |
|
| 37 |
Additionally, for Live-Code-Bench, we leverage [QWQ-Evaluation](https://github.com/QwenLM/QwQ/tree/main/eval) to reproduce results using a max context length of 32768, averaging over 8 runs.
|
| 38 |
|
| 39 |
-
| Benchmark
|
| 40 |
-
|
| 41 |
-
| AMC23 |
|
| 42 |
-
| AIME24 |
|
| 43 |
-
| AIME25 |
|
| 44 |
-
| Olympiad Bench |
|
| 45 |
-
| Math500 |
|
| 46 |
-
| Math
|
| 47 |
-
| Minerva Math
|
| 48 |
-
| Vietnamese Entrance Math Exam |
|
| 49 |
-
| LiveCodeBench
|
| 50 |
-
| IFEval |
|
| 51 |
-
| **Average** |
|
| 52 |
|
| 53 |
## How To Use
|
| 54 |
Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models.
|
|
|
|
| 36 |
|
| 37 |
Additionally, for Live-Code-Bench, we leverage [QWQ-Evaluation](https://github.com/QwenLM/QwQ/tree/main/eval) to reproduce results using a max context length of 32768, averaging over 8 runs.
|
| 38 |
|
| 39 |
+
| Benchmark | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B-Instruct | II-Thought-1.5B-Preview |
|
| 40 |
+
|-----------------------------------------|------------------------------|---------------------------|-------------------------|
|
| 41 |
+
| **AMC23** | 69.69 | 54.26 | **79.77** |
|
| 42 |
+
| **AIME24** | 29.43 | 10.73 | **34.17** |
|
| 43 |
+
| **AIME25** | 23.39 | 8.8 | **26.09** |
|
| 44 |
+
| **Olympiad Bench** | 43.15 | 36.07 | **52.78** |
|
| 45 |
+
| **Math500** | 83.6 | 73.15 | **87.2** |
|
| 46 |
+
| **Math Gaokao 2023 English** | 72.99 | 62.47 | **77.21** |
|
| 47 |
+
| **Minerva Math** | 27.57 | 24.45 | **30.79** |
|
| 48 |
+
| **Vietnamese Entrance Math Exam** | 40.32 | 26.69 | **46.24** |
|
| 49 |
+
| **LiveCodeBench** | 16.66 | 2.6 | **19.84** |
|
| 50 |
+
| **IFEval** | 44.24 | 27.22 | **44.84** |
|
| 51 |
+
| **Average** | 45.10 | 32.64 | **49.90** |
|
| 52 |
|
| 53 |
## How To Use
|
| 54 |
Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models.
|