Image-Text-to-Text
Transformers
Safetensors
llava
conversational

Added Average score for text benchmark

#4
by davasam - opened
Files changed (1) hide show
  1. README.md +22 -7
README.md CHANGED
@@ -62,7 +62,18 @@ Please see our [blog post](https://huggingface.co/blog/ServiceNow-AI/apriel-1p6-
62
  <th>Claude 4.5 Sonnet (thinking)</th>
63
  <th>o3-mini (high)</th>
64
  </tr>
65
-
 
 
 
 
 
 
 
 
 
 
 
66
  <!-- Function Calling -->
67
  <tr>
68
  <td rowspan="5" class="category">Function Calling</td>
@@ -199,7 +210,7 @@ Please see our [blog post](https://huggingface.co/blog/ServiceNow-AI/apriel-1p6-
199
  <td>LCB</td>
200
  <td>81</td>
201
  <td>73</td>
202
- <td>65</td>
203
  <td>77</td>
204
  <td>70</td>
205
  <td>84</td>
@@ -210,7 +221,7 @@ Please see our [blog post](https://huggingface.co/blog/ServiceNow-AI/apriel-1p6-
210
  <td>SciCode</td>
211
  <td>37</td>
212
  <td>35</td>
213
- <td>36</td>
214
  <td>40</td>
215
  <td>41</td>
216
  <td>39</td>
@@ -244,7 +255,7 @@ Please see our [blog post](https://huggingface.co/blog/ServiceNow-AI/apriel-1p6-
244
  </tr>
245
  <tr>
246
  <td>Work-Arena L1</td>
247
- <td>58</td>
248
  <td>51.5</td>
249
  <td>50.9</td>
250
  <td>63.9</td>
@@ -304,7 +315,7 @@ Please see our [blog post](https://huggingface.co/blog/ServiceNow-AI/apriel-1p6-
304
  <td>MMLU Pro</td>
305
  <td>79</td>
306
  <td>77</td>
307
- <td>85</td>
308
  <td>85</td>
309
  <td>83</td>
310
  <td>84</td>
@@ -362,13 +373,17 @@ Please see our [blog post](https://huggingface.co/blog/ServiceNow-AI/apriel-1p6-
362
  <td>62</td>
363
  <td>68</td>
364
  <td>66</td>
365
- <td>-</td>
366
  </tr>
367
  </table>
368
 
369
 
370
 
371
- \* AA LCR score in the table is with [DCA](https://arxiv.org/pdf/2402.17463) enabled. With default config, the model scores 36 on AA LCR.
 
 
 
 
372
 
373
  ---
374
 
 
62
  <th>Claude 4.5 Sonnet (thinking)</th>
63
  <th>o3-mini (high)</th>
64
  </tr>
65
+ <tr>
66
+ <td></td>
67
+ <td>Average Score**</td>
68
+ <td>53.22</td>
69
+ <td>46.56</td>
70
+ <td>52.56</td>
71
+ <td>51.92</td>
72
+ <td>50.71</td>
73
+ <td>62.58</td>
74
+ <td>60.37</td>
75
+ <td>48.85</td>
76
+ </tr>
77
  <!-- Function Calling -->
78
  <tr>
79
  <td rowspan="5" class="category">Function Calling</td>
 
210
  <td>LCB</td>
211
  <td>81</td>
212
  <td>73</td>
213
+ <td>88</td>
214
  <td>77</td>
215
  <td>70</td>
216
  <td>84</td>
 
221
  <td>SciCode</td>
222
  <td>37</td>
223
  <td>35</td>
224
+ <td>39</td>
225
  <td>40</td>
226
  <td>41</td>
227
  <td>39</td>
 
255
  </tr>
256
  <tr>
257
  <td>Work-Arena L1</td>
258
+ <td>50.2</td>
259
  <td>51.5</td>
260
  <td>50.9</td>
261
  <td>63.9</td>
 
315
  <td>MMLU Pro</td>
316
  <td>79</td>
317
  <td>77</td>
318
+ <td>81</td>
319
  <td>85</td>
320
  <td>83</td>
321
  <td>84</td>
 
373
  <td>62</td>
374
  <td>68</td>
375
  <td>66</td>
376
+ <td>30***</td>
377
  </tr>
378
  </table>
379
 
380
 
381
 
382
+ \* This score is with [DCA](https://arxiv.org/pdf/2402.17463) enabled. Without this, the model scores 36.
383
+
384
+ \** The average score is calculated using all benchmarks except BFCL v3 Only and DeepResearchBench, since some models do not have scores for these two benchmarks.
385
+
386
+ \*** AA LCR score for o3-mini-high is projected score based on its AA Index score.
387
 
388
  ---
389