ServiceNow-AI
/

Apriel-1.6-15b-Thinker

@@ -62,7 +62,18 @@ Please see our [blog post](https://huggingface.co/blog/ServiceNow-AI/apriel-1p6-
       <th>Claude 4.5 Sonnet (thinking)</th>
       <th>o3-mini (high)</th>
     </tr>
     <!-- Function Calling -->
     <tr>
       <td rowspan="5" class="category">Function Calling</td>
@@ -199,7 +210,7 @@ Please see our [blog post](https://huggingface.co/blog/ServiceNow-AI/apriel-1p6-
       <td>LCB</td>
       <td>81</td>
       <td>73</td>
-      <td>65</td>
       <td>77</td>
       <td>70</td>
       <td>84</td>
@@ -210,7 +221,7 @@ Please see our [blog post](https://huggingface.co/blog/ServiceNow-AI/apriel-1p6-
       <td>SciCode</td>
       <td>37</td>
       <td>35</td>
-      <td>36</td>
       <td>40</td>
       <td>41</td>
       <td>39</td>
@@ -244,7 +255,7 @@ Please see our [blog post](https://huggingface.co/blog/ServiceNow-AI/apriel-1p6-
     </tr>
     <tr>
       <td>Work-Arena L1</td>
-      <td>58</td>
       <td>51.5</td>
       <td>50.9</td>
       <td>63.9</td>
@@ -304,7 +315,7 @@ Please see our [blog post](https://huggingface.co/blog/ServiceNow-AI/apriel-1p6-
       <td>MMLU Pro</td>
       <td>79</td>
       <td>77</td>
-      <td>85</td>
       <td>85</td>
       <td>83</td>
       <td>84</td>
@@ -362,13 +373,17 @@ Please see our [blog post](https://huggingface.co/blog/ServiceNow-AI/apriel-1p6-
       <td>62</td>
       <td>68</td>
       <td>66</td>
-      <td>-</td>
     </tr>
   </table>
-\* AA LCR score in the table is with [DCA](https://arxiv.org/pdf/2402.17463) enabled. With default config, the model scores 36 on AA LCR.
 ---

       <th>Claude 4.5 Sonnet (thinking)</th>
       <th>o3-mini (high)</th>
     </tr>
+    <tr>
+      <td></td>
+      <td>Average Score**</td>
+      <td>53.22</td>
+      <td>46.56</td>
+      <td>52.56</td>
+      <td>51.92</td>
+      <td>50.71</td>
+      <td>62.58</td>
+      <td>60.37</td>
+      <td>48.85</td>
+    </tr>
     <!-- Function Calling -->
     <tr>
       <td rowspan="5" class="category">Function Calling</td>
       <td>LCB</td>
       <td>81</td>
       <td>73</td>
+      <td>88</td>
       <td>77</td>
       <td>70</td>
       <td>84</td>
       <td>SciCode</td>
       <td>37</td>
       <td>35</td>
+      <td>39</td>
       <td>40</td>
       <td>41</td>
       <td>39</td>
     </tr>
     <tr>
       <td>Work-Arena L1</td>
+      <td>50.2</td>
       <td>51.5</td>
       <td>50.9</td>
       <td>63.9</td>
       <td>MMLU Pro</td>
       <td>79</td>
       <td>77</td>
+      <td>81</td>
       <td>85</td>
       <td>83</td>
       <td>84</td>
       <td>62</td>
       <td>68</td>
       <td>66</td>
+      <td>30***</td>
     </tr>
   </table>
+\* This score is with [DCA](https://arxiv.org/pdf/2402.17463) enabled. Without this, the model scores 36.
+\** The average score is calculated using all benchmarks except BFCL v3 Only and DeepResearchBench, since some models do not have scores for these two benchmarks.
+\*** AA LCR score for o3-mini-high is projected score based on its AA Index score.
 ---