krishnateja95 commited on
Commit
2f2a1b4
·
verified ·
1 Parent(s): 11b59d3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +120 -0
README.md CHANGED
@@ -3,6 +3,126 @@ license: apache-2.0
3
  ---
4
 
5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  ### Accuracy
7
  <table>
8
  <thead>
 
3
  ---
4
 
5
 
6
+ ### Accuracy Comparison
7
+
8
+ <table>
9
+ <thead>
10
+ <tr>
11
+ <th>Category</th>
12
+ <th>Metric</th>
13
+ <th>ibm-granite/granite-4.0-h-small</th>
14
+ <th>ibm-granite/granite-4.0-h-small-FP8</th>
15
+ <th>RedHatAI/granite-4.0-h-small-FP8-block</th>
16
+ <th>RedHatAI/granite-4.0-h-small-FP8-dynamic</th>
17
+ </tr>
18
+ </thead>
19
+ <tbody>
20
+ <!-- OpenLLM Leaderboard V1 -->
21
+ <tr>
22
+ <td rowspan="7"><b>OpenLLM V1</b></td>
23
+ <td>ARC-Challenge (Acc-Norm, 25-shot)</td>
24
+ <td>72.27</td>
25
+ <td>72.10 (99.76%)</td>
26
+ <td>72.27 (100.00%)</td>
27
+ <td>72.10 (99.76%)</td>
28
+ </tr>
29
+ <tr>
30
+ <td>GSM8K (Strict-Match, 5-shot)</td>
31
+ <td>85.22</td>
32
+ <td>85.29 (100.09%)</td>
33
+ <td>85.52 (100.36%)</td>
34
+ <td>84.84 (99.56%)</td>
35
+ </tr>
36
+ <tr>
37
+ <td>HellaSwag (Acc-Norm, 10-shot)</td>
38
+ <td>86.08</td>
39
+ <td>85.88 (99.77%)</td>
40
+ <td>85.96 (99.86%)</td>
41
+ <td>85.88 (99.77%)</td>
42
+ </tr>
43
+ <tr>
44
+ <td>MMLU (Acc, 5-shot)</td>
45
+ <td>77.15</td>
46
+ <td>77.18 (100.03%)</td>
47
+ <td>77.23 (100.09%)</td>
48
+ <td>77.18 (100.03%)</td>
49
+ </tr>
50
+ <tr>
51
+ <td>TruthfulQA (MC2, 0-shot)</td>
52
+ <td>57.64</td>
53
+ <td>57.63 (99.99%)</td>
54
+ <td>57.94 (100.52%)</td>
55
+ <td>57.63 (100.00%)</td>
56
+ </tr>
57
+ <tr>
58
+ <td>Winogrande (Acc, 5-shot)</td>
59
+ <td>81.37</td>
60
+ <td>81.45 (100.10%)</td>
61
+ <td>80.82 (99.32%)</td>
62
+ <td>81.45 (100.10%)</td>
63
+ </tr>
64
+ <tr>
65
+ <td><b>Average Score</b></td>
66
+ <td><b>76.62</b></td>
67
+ <td><b>76.59 (99.96%)</b></td>
68
+ <td><b>76.62 (100.00%)</b></td>
69
+ <td><b>76.51 (99.86%)</b></td>
70
+ </tr>
71
+ <!-- OpenLLM Leaderboard V2 -->
72
+ <tr>
73
+ <td rowspan="7"><b>OpenLLM V2</b></td>
74
+ <td>IFEval (Inst Level Strict Acc, 0-shot)</td>
75
+ <td>87.53</td>
76
+ <td>87.17 (99.59%)</td>
77
+ <td>86.69 (99.04%)</td>
78
+ <td>87.41 (99.86%)</td>
79
+ </tr>
80
+ <tr>
81
+ <td>BBH (Acc-Norm, 3-shot)</td>
82
+ <td>61.52</td>
83
+ <td>61.31 (99.66%)</td>
84
+ <td>61.40 (99.80%)</td>
85
+ <td>61.19 (99.46%)</td>
86
+ </tr>
87
+ <tr>
88
+ <td>Math-Hard (Exact-Match, 4-shot)</td>
89
+ <td>46.22</td>
90
+ <td>43.73 (94.61%)</td>
91
+ <td>43.88 (94.93%)</td>
92
+ <td>41.77 (90.36%)</td>
93
+ </tr>
94
+ <tr>
95
+ <td>GPQA (Acc-Norm, 0-shot)</td>
96
+ <td>35.23</td>
97
+ <td>34.98 (99.29%)</td>
98
+ <td>34.23 (97.14%)</td>
99
+ <td>34.23 (97.14%)</td>
100
+ </tr>
101
+ <tr>
102
+ <td>MUSR (Acc-Norm, 0-shot)</td>
103
+ <td>46.69</td>
104
+ <td>46.56 (99.72%)</td>
105
+ <td>45.77 (98.02%)</td>
106
+ <td>45.77 (98.02%)</td>
107
+ </tr>
108
+ <tr>
109
+ <td>MMLU-Pro (Acc, 5-shot)</td>
110
+ <td>47.99</td>
111
+ <td>47.63 (99.26%)</td>
112
+ <td>47.93 (99.88%)</td>
113
+ <td>47.58 (99.15%)</td>
114
+ </tr>
115
+ <tr>
116
+ <td><b>Average Score</b></td>
117
+ <td><b>54.20</b></td>
118
+ <td><b>53.56 (98.82%)</b></td>
119
+ <td><b>53.32 (98.38%)</b></td>
120
+ <td><b>52.99 (97.77%)</b></td>
121
+ </tr>
122
+ </tbody>
123
+ </table>
124
+
125
+
126
  ### Accuracy
127
  <table>
128
  <thead>