desaifan-mbzuai commited on
Commit
707e362
·
verified ·
1 Parent(s): 5df4dd1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -74,6 +74,16 @@ Below we report performance across general, reasoning, mathematical, and coding
74
  | **Coding Tasks** | | | | | | | | | |
75
  | **MBPP** | 57.6 | 57.8 | 58.2 | 59.8 | 61.8 | **75.4** | <u>69.2</u> | 64.4 | 60.2 |
76
  | **HUMANEVAL** | 50.0 | 51.2 | <u>53.7</u> | **54.3** | **54.3** | **54.3** | 42.1 | 50.6 | 36.0 |
 
 
 
 
 
 
 
 
 
 
77
 
78
 
79
  Below we report the evaluation results for K2-V2 after supervised fine-tuning (SFT). These variants correspond to three levels of reasoning effort (Low < Medium < High).
 
74
  | **Coding Tasks** | | | | | | | | | |
75
  | **MBPP** | 57.6 | 57.8 | 58.2 | 59.8 | 61.8 | **75.4** | <u>69.2</u> | 64.4 | 60.2 |
76
  | **HUMANEVAL** | 50.0 | 51.2 | <u>53.7</u> | **54.3** | **54.3** | **54.3** | 42.1 | 50.6 | 36.0 |
77
+ | **Logic Puzzles** | | | | | | | | | |
78
+ | **COUNTDOWN** | 1.3 | <u>53.3</u> | 53.1 | 35.9 | **75.6** | 6.0 | 1.0 | 0.5 | 23.2 |
79
+ | **KK-4 PEOPLE** | 4.8 | 44.9 | <u>68.0</u> | 64.5 | **92.9** | 26.1 | 4.2 | 7.6 | 42.4 |
80
+ | **KK-8 PEOPLE** | 0.5 | 23.2 | 41.3 | <u>51.6</u> | **82.8** | 5.7 | 1.1 | 1.3 | 13.0 |
81
+ | **ORDER-15 ITEMS** | 4.7 | 30.7 | 47.2 | <u>55.8</u> | **87.6** | 37.0 | 3.5 | 4.5 | 25.0 |
82
+ | **ORDER-30 ITEMS** | 0.0 | 0.3 | 3.0 | <u>34.1</u> | **40.3** | 0.7 | 0.2 | 0.1 | 0.6 |
83
+ | **Instruction Following** | | | | | | | | | |
84
+ | **IFEVAL** | 17.4 | 26.2 | 28.5 | <u>34.5</u> | 26.7 | **40.3** | 15.1 | 17.4 | 13.2 |
85
+ | **Arabic** | | | | | | | | | |
86
+ | **MMLU-Arabic** | 65.4 | 66.1 | 64.5 | 66.6 | 65.5 | **74.1** | 65.0 | <u>66.8</u> | 47.8 |
87
 
88
 
89
  Below we report the evaluation results for K2-V2 after supervised fine-tuning (SFT). These variants correspond to three levels of reasoning effort (Low < Medium < High).