Slim Frikha
commited on
fix benchs
Browse files
README.md
CHANGED
|
@@ -130,9 +130,9 @@ We report in the following table our internal pipeline benchmarks.
|
|
| 130 |
</tr>
|
| 131 |
<tr>
|
| 132 |
<td>IFEval</td>
|
| 133 |
-
<td><b>
|
| 134 |
-
<td>64.
|
| 135 |
-
<td>66.
|
| 136 |
<td>68.3</td>
|
| 137 |
</tr>
|
| 138 |
<tr>
|
|
@@ -167,7 +167,7 @@ We report in the following table our internal pipeline benchmarks.
|
|
| 167 |
</tr>
|
| 168 |
<tr>
|
| 169 |
<td>GPQA (0-shot)</td>
|
| 170 |
-
<td>
|
| 171 |
<td>29.2</td>
|
| 172 |
<td>27.0</td>
|
| 173 |
<td><b>29.6</b></td>
|
|
|
|
| 130 |
</tr>
|
| 131 |
<tr>
|
| 132 |
<td>IFEval</td>
|
| 133 |
+
<td><b>74.7</b></td>
|
| 134 |
+
<td>64.1</td>
|
| 135 |
+
<td>66.3</td>
|
| 136 |
<td>68.3</td>
|
| 137 |
</tr>
|
| 138 |
<tr>
|
|
|
|
| 167 |
</tr>
|
| 168 |
<tr>
|
| 169 |
<td>GPQA (0-shot)</td>
|
| 170 |
+
<td>32.2</td>
|
| 171 |
<td>29.2</td>
|
| 172 |
<td>27.0</td>
|
| 173 |
<td><b>29.6</b></td>
|