Update README.md
Browse files
README.md
CHANGED
|
@@ -925,22 +925,153 @@ Here, we present results for seven categories of tasks in Spanish, Catalan, Basq
|
|
| 925 |
|
| 926 |
Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
|
| 927 |
|
| 928 |
-
|
| 929 |
-
|
| 930 |
-
|
| 931 |
-
|
| 932 |
-
|
| 933 |
-
|
| 934 |
-
|
| 935 |
-
|
| 936 |
-
|
| 937 |
-
|
| 938 |
-
|
| 939 |
-
|
| 940 |
-
|
| 941 |
-
|
| 942 |
-
|
| 943 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 944 |
|
| 945 |
---
|
| 946 |
|
|
|
|
| 925 |
|
| 926 |
Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
|
| 927 |
|
| 928 |
+
<style type="text/css">
|
| 929 |
+
.tg {border-collapse:collapse;border-spacing:0;}
|
| 930 |
+
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
|
| 931 |
+
overflow:hidden;padding:10px 5px;word-break:normal;}
|
| 932 |
+
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
|
| 933 |
+
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
|
| 934 |
+
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
|
| 935 |
+
</style>
|
| 936 |
+
<table class="tg"><thead>
|
| 937 |
+
<tr>
|
| 938 |
+
<th class="tg-0pky"><span style="font-weight:bold">Category</span></th>
|
| 939 |
+
<th class="tg-0pky"><span style="font-weight:bold">Dataset</span></th>
|
| 940 |
+
<th class="tg-0pky"><span style="font-weight:bold">Criteria</span></th>
|
| 941 |
+
<th class="tg-0pky"><span style="font-weight:bold">es</span></th>
|
| 942 |
+
<th class="tg-0pky"><span style="font-weight:bold">ca</span></th>
|
| 943 |
+
<th class="tg-0pky"><span style="font-weight:bold">gl</span></th>
|
| 944 |
+
<th class="tg-0pky"><span style="font-weight:bold">eu</span></th>
|
| 945 |
+
<th class="tg-0pky"><span style="font-weight:bold">en</span></th>
|
| 946 |
+
</tr></thead>
|
| 947 |
+
<tbody>
|
| 948 |
+
<tr>
|
| 949 |
+
<td class="tg-0pky">Commonsense Reasoning</td>
|
| 950 |
+
<td class="tg-0pky">XStoryCloze</td>
|
| 951 |
+
<td class="tg-0pky">Ending coherence</td>
|
| 952 |
+
<td class="tg-0pky">3.24/0.63</td>
|
| 953 |
+
<td class="tg-0pky">3.12/0.51</td>
|
| 954 |
+
<td class="tg-0pky">2.87/0.59</td>
|
| 955 |
+
<td class="tg-0pky">2.16/0.52</td>
|
| 956 |
+
<td class="tg-0pky">3.71/0.50</td>
|
| 957 |
+
</tr>
|
| 958 |
+
<tr>
|
| 959 |
+
<td class="tg-0pky" rowspan="3">Paraphrasing</td>
|
| 960 |
+
<td class="tg-0pky" rowspan="3">PAWS</td>
|
| 961 |
+
<td class="tg-0pky">Completeness `(B)`</td>
|
| 962 |
+
<td class="tg-0pky">0.86/0.07</td>
|
| 963 |
+
<td class="tg-0pky">0.82/0.09</td>
|
| 964 |
+
<td class="tg-0pky">0.78/0.10</td>
|
| 965 |
+
<td class="tg-0pky">-- / --</td>
|
| 966 |
+
<td class="tg-0pky">0.92/0.05</td>
|
| 967 |
+
</tr>
|
| 968 |
+
<tr>
|
| 969 |
+
<td class="tg-0pky">Paraphrase generation</td>
|
| 970 |
+
<td class="tg-0pky">3.81/0.54</td>
|
| 971 |
+
<td class="tg-0pky">3.67/0.55</td>
|
| 972 |
+
<td class="tg-0pky">3.56/0.57</td>
|
| 973 |
+
<td class="tg-0pky">-- / --</td>
|
| 974 |
+
<td class="tg-0pky">3.98/0.37</td>
|
| 975 |
+
</tr>
|
| 976 |
+
<tr>
|
| 977 |
+
<td class="tg-0pky">Grammatical correctness `(B)`</td>
|
| 978 |
+
<td class="tg-0pky">0.93/0.03</td>
|
| 979 |
+
<td class="tg-0pky">0.92/0.05</td>
|
| 980 |
+
<td class="tg-0pky">0.89/0.06</td>
|
| 981 |
+
<td class="tg-0pky">-- / --</td>
|
| 982 |
+
<td class="tg-0pky">0.96/0.03</td>
|
| 983 |
+
</tr>
|
| 984 |
+
<tr>
|
| 985 |
+
<td class="tg-0pky" rowspan="2">Reading Comprehension</td>
|
| 986 |
+
<td class="tg-0pky" rowspan="2">Belebele</td>
|
| 987 |
+
<td class="tg-0pky">Passage comprehension</td>
|
| 988 |
+
<td class="tg-0pky">3.43/0.43</td>
|
| 989 |
+
<td class="tg-0pky">3.28/0.50</td>
|
| 990 |
+
<td class="tg-0pky">3.02/0.56</td>
|
| 991 |
+
<td class="tg-0pky">2.61/0.43</td>
|
| 992 |
+
<td class="tg-0pky">3.43/0.58</td>
|
| 993 |
+
</tr>
|
| 994 |
+
<tr>
|
| 995 |
+
<td class="tg-0pky">Answer relevance `(B)`</td>
|
| 996 |
+
<td class="tg-0pky">0.86/0.05</td>
|
| 997 |
+
<td class="tg-0pky">0.84/0.05</td>
|
| 998 |
+
<td class="tg-0pky">0.75/0.08</td>
|
| 999 |
+
<td class="tg-0pky">0.65/0.11</td>
|
| 1000 |
+
<td class="tg-0pky">0.83/0.06</td>
|
| 1001 |
+
</tr>
|
| 1002 |
+
<tr>
|
| 1003 |
+
<td class="tg-0pky" rowspan="2">Extreme Summarization</td>
|
| 1004 |
+
<td class="tg-0pky" rowspan="2">XLSum & caBreu & summarization_gl</td>
|
| 1005 |
+
<td class="tg-0pky">Informativeness</td>
|
| 1006 |
+
<td class="tg-0pky">3.37/0.34</td>
|
| 1007 |
+
<td class="tg-0pky">3.57/0.31</td>
|
| 1008 |
+
<td class="tg-0pky">3.40/0.31</td>
|
| 1009 |
+
<td class="tg-0pky">-- / --</td>
|
| 1010 |
+
<td class="tg-0pky">3.32/0.26</td>
|
| 1011 |
+
</tr>
|
| 1012 |
+
<tr>
|
| 1013 |
+
<td class="tg-0pky">Conciseness</td>
|
| 1014 |
+
<td class="tg-0pky">3.06/0.34</td>
|
| 1015 |
+
<td class="tg-0pky">2.88/0.50</td>
|
| 1016 |
+
<td class="tg-0pky">3.09/0.38</td>
|
| 1017 |
+
<td class="tg-0pky">-- / --</td>
|
| 1018 |
+
<td class="tg-0pky">3.32/0.22</td>
|
| 1019 |
+
</tr>
|
| 1020 |
+
<tr>
|
| 1021 |
+
<td class="tg-0pky" rowspan="2">Math</td>
|
| 1022 |
+
<td class="tg-0pky" rowspan="2">MGSM</td>
|
| 1023 |
+
<td class="tg-0pky">Reasoning capability</td>
|
| 1024 |
+
<td class="tg-0pky">3.29/0.72</td>
|
| 1025 |
+
<td class="tg-0pky">3.16/0.65</td>
|
| 1026 |
+
<td class="tg-0pky">3.33/0.60</td>
|
| 1027 |
+
<td class="tg-0pky">2.56/0.52</td>
|
| 1028 |
+
<td class="tg-0pky">3.35/0.65</td>
|
| 1029 |
+
</tr>
|
| 1030 |
+
<tr>
|
| 1031 |
+
<td class="tg-0pky">Mathematical correctness `(B)`</td>
|
| 1032 |
+
<td class="tg-0pky">0.68/0.12</td>
|
| 1033 |
+
<td class="tg-0pky">0.65/0.13</td>
|
| 1034 |
+
<td class="tg-0pky">0.73/0.11</td>
|
| 1035 |
+
<td class="tg-0pky">0.59/0.13</td>
|
| 1036 |
+
<td class="tg-0pky">0.67/0.12</td>
|
| 1037 |
+
</tr>
|
| 1038 |
+
<tr>
|
| 1039 |
+
<td class="tg-0pky" rowspan="2">Translation form Language</td>
|
| 1040 |
+
<td class="tg-0pky" rowspan="2">FLORES-200</td>
|
| 1041 |
+
<td class="tg-0pky">Fluency</td>
|
| 1042 |
+
<td class="tg-0pky">3.95/0.11</td>
|
| 1043 |
+
<td class="tg-0pky">3.88/0.15</td>
|
| 1044 |
+
<td class="tg-0pky">-- / --</td>
|
| 1045 |
+
<td class="tg-0pky">-- / --</td>
|
| 1046 |
+
<td class="tg-0pky">3.92/0.14</td>
|
| 1047 |
+
</tr>
|
| 1048 |
+
<tr>
|
| 1049 |
+
<td class="tg-0pky">Accuracy</td>
|
| 1050 |
+
<td class="tg-0pky">4.22/0.15</td>
|
| 1051 |
+
<td class="tg-0pky">4.25/0.21</td>
|
| 1052 |
+
<td class="tg-0pky">-- / --</td>
|
| 1053 |
+
<td class="tg-0pky">-- / --</td>
|
| 1054 |
+
<td class="tg-0pky">4.25/0.23</td>
|
| 1055 |
+
</tr>
|
| 1056 |
+
<tr>
|
| 1057 |
+
<td class="tg-0pky" rowspan="2">Translation to Language</td>
|
| 1058 |
+
<td class="tg-0pky" rowspan="2">FLORES-200</td>
|
| 1059 |
+
<td class="tg-0pky">Fluency</td>
|
| 1060 |
+
<td class="tg-0pky">3.92/0.11</td>
|
| 1061 |
+
<td class="tg-0pky">3.84/0.14</td>
|
| 1062 |
+
<td class="tg-0pky">-- / --</td>
|
| 1063 |
+
<td class="tg-0pky">-- / --</td>
|
| 1064 |
+
<td class="tg-0pky">4.19/0.14</td>
|
| 1065 |
+
</tr>
|
| 1066 |
+
<tr>
|
| 1067 |
+
<td class="tg-0pky">Accuracy</td>
|
| 1068 |
+
<td class="tg-0pky">4.31/0.16</td>
|
| 1069 |
+
<td class="tg-0pky">4.18/0.20</td>
|
| 1070 |
+
<td class="tg-0pky">-- / --</td>
|
| 1071 |
+
<td class="tg-0pky">-- / --</td>
|
| 1072 |
+
<td class="tg-0pky">4.63/0.15</td>
|
| 1073 |
+
</tr>
|
| 1074 |
+
</tbody></table>
|
| 1075 |
|
| 1076 |
---
|
| 1077 |
|