andrewzamai commited on
Commit
76efe2f
·
verified ·
1 Parent(s): 51cc6c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +172 -3
README.md CHANGED
@@ -20,10 +20,179 @@ SLIMER performs comparably to these state-of-the-art models on OOD input domains
20
 
21
  <img src="https://huggingface.co/expertai/SLIMER/resolve/main/OOD_evals.png">
22
 
23
- To experiment the ability of existing models on never-seen-before labels, we extend the standard zero-shot evaluations on BUSTER, which is characterized by financial entities that are rather far from the more traditional tags observed by all models during training.
24
- An inverse trend to the OOD table can be observed, with SLIMER instead emerging as the most effective in dealing with unseen labels, thanks to its lighter instruction tuning methodology and the use of definition and guidelines.
25
 
26
- <img src="https://huggingface.co/expertai/SLIMER/resolve/main/BUSTERvsOOD.png" width="250">
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
 
29
  ```python
 
20
 
21
  <img src="https://huggingface.co/expertai/SLIMER/resolve/main/OOD_evals.png">
22
 
23
+ We extend the standard zero-shot evaluations on BUSTER, which is characterized by financial entities that are rather far from the more traditional tags observed by all models during training.
24
+ An inverse trend can be observed, with SLIMER instead emerging as the most effective in dealing with these unseen labels, thanks to its lighter instruction tuning methodology and the use of definition and guidelines.
25
 
26
+ <table>
27
+ <thead>
28
+ <tr>
29
+ <th>Model</th>
30
+ <th>Backbone</th>
31
+ <th>#Params</th>
32
+ <th colspan="2">MIT</th>
33
+ <th colspan="5">CrossNER</th>
34
+ <th>AVG</th>
35
+ </tr>
36
+ <tr>
37
+ <th></th>
38
+ <th></th>
39
+ <th></th>
40
+ <th>Movie</th>
41
+ <th>Restaurant</th>
42
+ <th>AI</th>
43
+ <th>Literature</th>
44
+ <th>Music</th>
45
+ <th>Politics</th>
46
+ <th>Science</th>
47
+ <th></th>
48
+ </tr>
49
+ </thead>
50
+ <tbody>
51
+ <tr>
52
+ <td>ChatGPT</td>
53
+ <td>gpt-3.5-turbo</td>
54
+ <td>-</td>
55
+ <td>5.3</td>
56
+ <td>32.8</td>
57
+ <td>52.4</td>
58
+ <td>39.8</td>
59
+ <td>66.6</td>
60
+ <td>68.5</td>
61
+ <td>67.0</td>
62
+ <td>47.5</td>
63
+ </tr>
64
+ <tr>
65
+ <td>InstructUIE</td>
66
+ <td>Flan-T5-xxl</td>
67
+ <td>11B</td>
68
+ <td>63.0</td>
69
+ <td>21.0</td>
70
+ <td>49.0</td>
71
+ <td>47.2</td>
72
+ <td>53.2</td>
73
+ <td>48.2</td>
74
+ <td>49.3</td>
75
+ <td>47.3</td>
76
+ </tr>
77
+ <tr>
78
+ <td>UniNER-type</td>
79
+ <td>LLaMA-1</td>
80
+ <td>7B</td>
81
+ <td>42.4</td>
82
+ <td>31.7</td>
83
+ <td>53.5</td>
84
+ <td>59.4</td>
85
+ <td>65.0</td>
86
+ <td>60.8</td>
87
+ <td>61.1</td>
88
+ <td>53.4</td>
89
+ </tr>
90
+ <tr>
91
+ <td>UniNER-def</td>
92
+ <td>LLaMA-1</td>
93
+ <td>7B</td>
94
+ <td>27.1</td>
95
+ <td>27.9</td>
96
+ <td>44.5</td>
97
+ <td>49.2</td>
98
+ <td>55.8</td>
99
+ <td>57.5</td>
100
+ <td>52.9</td>
101
+ <td>45.0</td>
102
+ </tr>
103
+ <tr>
104
+ <td>UniNER-type+sup.</td>
105
+ <td>LLaMA-1</td>
106
+ <td>7B</td>
107
+ <td>61.2</td>
108
+ <td>35.2</td>
109
+ <td>62.9</td>
110
+ <td>64.9</td>
111
+ <td>70.6</td>
112
+ <td>66.9</td>
113
+ <td>70.8</td>
114
+ <td>61.8</td>
115
+ </tr>
116
+ <tr>
117
+ <td>GoLLIE</td>
118
+ <td>Code-LLaMA</td>
119
+ <td>7B</td>
120
+ <td>63.0</td>
121
+ <td>43.4</td>
122
+ <td>59.1</td>
123
+ <td>62.7</td>
124
+ <td>67.8</td>
125
+ <td>57.2</td>
126
+ <td>55.5</td>
127
+ <td>58.4</td>
128
+ </tr>
129
+ <tr>
130
+ <td>GLiNER-L</td>
131
+ <td>DeBERTa-v3</td>
132
+ <td>0.3B</td>
133
+ <td>57.2</td>
134
+ <td>42.9</td>
135
+ <td>57.2</td>
136
+ <td>64.4</td>
137
+ <td>69.6</td>
138
+ <td>72.6</td>
139
+ <td>62.6</td>
140
+ <td>60.9</td>
141
+ </tr>
142
+ <tr>
143
+ <td>GNER-T5</td>
144
+ <td>Flan-T5-xxl</td>
145
+ <td>11B</td>
146
+ <td>62.5</td>
147
+ <td>51.0</td>
148
+ <td>68.2</td>
149
+ <td>68.7</td>
150
+ <td>81.2</td>
151
+ <td>75.1</td>
152
+ <td>76.7</td>
153
+ <td>69.1</td>
154
+ </tr>
155
+ <tr>
156
+ <td>GNER-LLaMA</td>
157
+ <td>LLaMA-1</td>
158
+ <td>7B</td>
159
+ <td>68.6</td>
160
+ <td>47.5</td>
161
+ <td>63.1</td>
162
+ <td>68.2</td>
163
+ <td>75.7</td>
164
+ <td>69.4</td>
165
+ <td>69.9</td>
166
+ <td>66.1</td>
167
+ </tr>
168
+ <tr>
169
+ <td>SLIMER w/o D&amp;G</td>
170
+ <td>LLaMA-2-chat</td>
171
+ <td>7B</td>
172
+ <td>46.4 &plusmn; 1.8</td>
173
+ <td>36.3 &plusmn; 2.1</td>
174
+ <td>49.6 &plusmn; 3.2</td>
175
+ <td>58.4 &plusmn; 1.7</td>
176
+ <td>56.8 &plusmn; 2.1</td>
177
+ <td>57.9 &plusmn; 2.1</td>
178
+ <td>53.8 &plusmn; 1.7</td>
179
+ <td>51.3 &plusmn; 2.0</td>
180
+ </tr>
181
+ <tr>
182
+ <td><b>SLIMER</b></td>
183
+ <td><b>LLaMA-2-chat</b></td>
184
+ <td><b>7B</b></td>
185
+ <td><b>50.9 &plusmn; 0.9</b></td>
186
+ <td><b>38.2 &plusmn; 0.3</b></td>
187
+ <td><b>50.1 &plusmn; 2.4</b></td>
188
+ <td><b>58.7 &plusmn; 0.2</b></td>
189
+ <td><b>60.0 &plusmn; 0.5</b></td>
190
+ <td><b>63.9 &plusmn; 1.0</b></td>
191
+ <td><b>56.3 &plusmn; 0.6</b></td>
192
+ <td><b>54.0 &plusmn; 0.5</b></td>
193
+ </tr>
194
+ </tbody>
195
+ </table>
196
 
197
 
198
  ```python