Phu Nguyen commited on
Commit
53d23f9
·
verified ·
1 Parent(s): 54cc0b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -13
README.md CHANGED
@@ -36,19 +36,19 @@ Sampling Configs:
36
 
37
  Additionally, for Live-Code-Bench, we leverage [QWQ-Evaluation](https://github.com/QwenLM/QwQ/tree/main/eval) to reproduce results using a max context length of 32768, averaging over 8 runs.
38
 
39
- | Benchmark | DeepSeek-R1-Distill-Qwen-1.5B | II-Thought-1.5B-Preview |
40
- |-----------|-------------------------------|--------------------------|
41
- | AMC23 | 68.48 | **79.41** |
42
- | AIME24 | 28.07 | **33.39** |
43
- | AIME25 | 22.6 | **25.68** |
44
- | Olympiad Bench | 42.04 | **51.63** |
45
- | Math500 | 82.3 | **86.8** |
46
- | Math Gakao 2023 English | 72.18 | **76.85** |
47
- | Minerva Math | 27.62 | **31.89** |
48
- | Vietnamese Entrance Math Exam | 39.85 | **45.12** |
49
- | LiveCodeBench | 16.66 | **19.84** |
50
- | IFEval | 41.95 | **45.56** |
51
- | **Average** | 44.175 | **49.61** |
52
 
53
  ## How To Use
54
  Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models.
 
36
 
37
  Additionally, for Live-Code-Bench, we leverage [QWQ-Evaluation](https://github.com/QwenLM/QwQ/tree/main/eval) to reproduce results using a max context length of 32768, averaging over 8 runs.
38
 
39
+ | Benchmark | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B-Instruct | II-Thought-1.5B-Preview |
40
+ |-----------------------------------------|------------------------------|---------------------------|-------------------------|
41
+ | **AMC23** | 69.69 | 54.26 | **79.77** |
42
+ | **AIME24** | 29.43 | 10.73 | **34.17** |
43
+ | **AIME25** | 23.39 | 8.8 | **26.09** |
44
+ | **Olympiad Bench** | 43.15 | 36.07 | **52.78** |
45
+ | **Math500** | 83.6 | 73.15 | **87.2** |
46
+ | **Math Gaokao 2023 English** | 72.99 | 62.47 | **77.21** |
47
+ | **Minerva Math** | 27.57 | 24.45 | **30.79** |
48
+ | **Vietnamese Entrance Math Exam** | 40.32 | 26.69 | **46.24** |
49
+ | **LiveCodeBench** | 16.66 | 2.6 | **19.84** |
50
+ | **IFEval** | 44.24 | 27.22 | **44.84** |
51
+ | **Average** | 45.10 | 32.64 | **49.90** |
52
 
53
  ## How To Use
54
  Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models.