File size: 1,236 Bytes
b7b0e5d
 
 
 
 
 
 
 
 
 
 
 
41a39cf
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|---|---|---|
| 수학(Math) | 5.86 | 5.14 |
| 문법(Grammar) | 4.71 | 1.29 |
| 이해(Understanding) | 4.00 | 4.43 |
| 추론(Reasoning) | 5.14 | 6.71 |
| 코딩(Coding) | 7.43 | 7.57 |
| 글쓰기(Writing) | 8.43 | 8.00 |

| Category | Score |
|---|---|
| Single turn | 5.93 |
| Multi turn | 5.52 |
| Overall | 5.73 |


| Tasks  |Version|     Filter     |n-shot|        Metric         |   |Value |   |Stderr|
|--------|------:|----------------|-----:|-----------------------|---|-----:|---|------|
|gsm8k   |      3|flexible-extract|     5|exact_match            |↑  |0.7013|±  |0.0126|
|        |       |strict-match    |     5|exact_match            |↑  |0.2418|±  |0.0118|
|gsm8k-ko|      1|flexible-extract|     5|exact_match            |↑  |0.4466|±  |0.0137|
|        |       |strict-match    |     5|exact_match            |↑  |0.4420|±  |0.0137|
|ifeval  |      4|none            |     0|inst_level_loose_acc   |↑  |0.8549|±  |   N/A|
|        |       |none            |     0|inst_level_strict_acc  |↑  |0.8225|±  |   N/A|
|        |       |none            |     0|prompt_level_loose_acc |↑  |0.7874|±  |0.0176|
|        |       |none            |     0|prompt_level_strict_acc|↑  |0.7468|±  |0.0187|