Adding metrics for new benchmarks

#6
by tidealwari - opened
Files changed (1) hide show
  1. README.md +55 -3
README.md CHANGED
@@ -111,6 +111,7 @@ Please refer to [Qwen Documentation](https://qwen.readthedocs.io/en/latest/deplo
111
  Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**.
112
  We advise adding the `rope_scaling` configuration only when processing long contexts is required.
113
 
 
114
  ### Comparison of Qiskit models across benchmarks
115
 
116
  <table
@@ -141,35 +142,86 @@ We advise adding the `rope_scaling` configuration only when processing long cont
141
  <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
142
  HumanEval
143
  </th>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
144
  </tr>
145
  </thead>
146
  <tbody>
147
  <tr style="background:#f7fafc;">
148
  <td style="padding:12px 16px; font-weight:700; color:#07102a;">Qwen2.5-Coder-14B-Qiskit</td>
149
- <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">25.16</td>
150
  <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">49.01</td>
151
  <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">91.46</td>
 
 
 
 
 
 
 
152
  </tr>
153
  <tr style="background:#ffffff;">
154
  <td style="padding:12px 16px; color:#0f172a;">mistral-small-3.2-24b-qiskit</td>
155
  <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">20.53</td>
156
  <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">40.39</td>
157
  <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">77.49</td>
 
 
 
 
 
 
 
158
  </tr>
159
  <tr style="background:#ffffff;">
160
  <td style="padding:12px 16px; color:#0f172a;">granite-3.3-8b-qiskit</td>
161
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">14.56</td>
162
  <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">27.15</td>
163
  <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">62.80</td>
 
 
 
 
 
 
 
164
  </tr>
165
  <tr style="background:#fbfdff;">
166
  <td style="padding:12px 16px; color:#0f172a;">granite-3.2-8b-qiskit</td>
167
  <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">9.93</td>
168
  <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">24.50</td>
169
- <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">57.31</td>
 
 
 
 
 
 
 
170
  </tr>
171
  </tbody>
172
  </table>
 
 
173
 
174
 
175
  ## Training Data
 
111
  Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**.
112
  We advise adding the `rope_scaling` configuration only when processing long contexts is required.
113
 
114
+
115
  ### Comparison of Qiskit models across benchmarks
116
 
117
  <table
 
142
  <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
143
  HumanEval
144
  </th>
145
+ <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
146
+ ASDiv
147
+ </th>
148
+ <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
149
+ MathQA
150
+ </th>
151
+ <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
152
+ SciQ
153
+ </th>
154
+ <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
155
+ MBPP
156
+ </th>
157
+ <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
158
+ IFEval
159
+ </th>
160
+ <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
161
+ CrowsPairs (English)
162
+ </th>
163
+ <th style="text-align:center; padding:12px 16px; background:linear-gradient(90deg,#f6f8fb,#eef3f9); color:#0b1220; font-weight:700; border-bottom:1px solid rgba(15,23,42,0.06);">
164
+ TruthfulQA (MC1 acc)
165
+ </th>
166
  </tr>
167
  </thead>
168
  <tbody>
169
  <tr style="background:#f7fafc;">
170
  <td style="padding:12px 16px; font-weight:700; color:#07102a;">Qwen2.5-Coder-14B-Qiskit</td>
171
+ <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">25.17</td>
172
  <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">49.01</td>
173
  <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">91.46</td>
174
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">4.21</td>
175
+ <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">53.90</td>
176
+ <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">97.00</td>
177
+ <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">77.60</td>
178
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">49.64</td>
179
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">65.18</td>
180
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">37.82</td>
181
  </tr>
182
  <tr style="background:#ffffff;">
183
  <td style="padding:12px 16px; color:#0f172a;">mistral-small-3.2-24b-qiskit</td>
184
  <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">20.53</td>
185
  <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">40.39</td>
186
  <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">77.49</td>
187
+ <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">20.69</td>
188
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">53.40</td>
189
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">96.40</td>
190
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">63.40</td>
191
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">31.66</td>
192
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">67.56</td>
193
+ <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">42.84</td>
194
  </tr>
195
  <tr style="background:#ffffff;">
196
  <td style="padding:12px 16px; color:#0f172a;">granite-3.3-8b-qiskit</td>
197
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">14.57</td>
198
  <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">27.15</td>
199
  <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">62.80</td>
200
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">0.48</td>
201
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">38.66</td>
202
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">93.30</td>
203
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">52.40</td>
204
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">59.71</td>
205
+ <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">59.75</td>
206
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">39.05</td>
207
  </tr>
208
  <tr style="background:#fbfdff;">
209
  <td style="padding:12px 16px; color:#0f172a;">granite-3.2-8b-qiskit</td>
210
  <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">9.93</td>
211
  <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">24.50</td>
212
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">57.32</td>
213
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">0.09</td>
214
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">41.41</td>
215
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">96.30</td>
216
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">51.80</td>
217
+ <td style="padding:12px 16px; text-align:center; font-weight:700; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">60.79</td>
218
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">66.79</td>
219
+ <td style="padding:12px 16px; text-align:center; font-family:ui-monospace, SFMono-Regular, Menlo, Monaco, monospace;">40.51</td>
220
  </tr>
221
  </tbody>
222
  </table>
223
+
224
+ *Note: All models listed in the benchmark table were evaluated using their respective system prompt, defined in their Hugging Face model.*
225
 
226
 
227
  ## Training Data