Dexifried commited on
Commit
db34ed6
·
verified ·
1 Parent(s): e293a2d

Tiny-router checkpoint (encoder=microsoft/MiniLM-L12-H384-uncased, epochs=10)

Browse files
history.json ADDED
@@ -0,0 +1,2697 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epochs": [
3
+ {
4
+ "per_head": {
5
+ "relation_to_previous": {
6
+ "accuracy": 0.6775,
7
+ "macro_f1": 0.2761,
8
+ "per_label": {
9
+ "new": {
10
+ "precision": 1.0,
11
+ "recall": 1.0,
12
+ "f1": 1.0,
13
+ "support": 102
14
+ },
15
+ "follow_up": {
16
+ "precision": 0.4885,
17
+ "recall": 1.0,
18
+ "f1": 0.6564,
19
+ "support": 85
20
+ },
21
+ "correction": {
22
+ "precision": 0.0,
23
+ "recall": 0.0,
24
+ "f1": 0.0,
25
+ "support": 29
26
+ },
27
+ "confirmation": {
28
+ "precision": 0.0,
29
+ "recall": 0.0,
30
+ "f1": 0.0,
31
+ "support": 20
32
+ },
33
+ "cancellation": {
34
+ "precision": 0.0,
35
+ "recall": 0.0,
36
+ "f1": 0.0,
37
+ "support": 21
38
+ },
39
+ "closure": {
40
+ "precision": 0.0,
41
+ "recall": 0.0,
42
+ "f1": 0.0,
43
+ "support": 19
44
+ }
45
+ },
46
+ "confusion_matrix": [
47
+ [
48
+ 102,
49
+ 0,
50
+ 0,
51
+ 0,
52
+ 0,
53
+ 0
54
+ ],
55
+ [
56
+ 0,
57
+ 85,
58
+ 0,
59
+ 0,
60
+ 0,
61
+ 0
62
+ ],
63
+ [
64
+ 0,
65
+ 29,
66
+ 0,
67
+ 0,
68
+ 0,
69
+ 0
70
+ ],
71
+ [
72
+ 0,
73
+ 20,
74
+ 0,
75
+ 0,
76
+ 0,
77
+ 0
78
+ ],
79
+ [
80
+ 0,
81
+ 21,
82
+ 0,
83
+ 0,
84
+ 0,
85
+ 0
86
+ ],
87
+ [
88
+ 0,
89
+ 19,
90
+ 0,
91
+ 0,
92
+ 0,
93
+ 0
94
+ ]
95
+ ]
96
+ },
97
+ "actionability": {
98
+ "accuracy": 0.4891,
99
+ "macro_f1": 0.219,
100
+ "per_label": {
101
+ "none": {
102
+ "precision": 0.0,
103
+ "recall": 0.0,
104
+ "f1": 0.0,
105
+ "support": 61
106
+ },
107
+ "review": {
108
+ "precision": 0.0,
109
+ "recall": 0.0,
110
+ "f1": 0.0,
111
+ "support": 79
112
+ },
113
+ "act": {
114
+ "precision": 0.4909,
115
+ "recall": 0.9926,
116
+ "f1": 0.6569,
117
+ "support": 136
118
+ }
119
+ },
120
+ "confusion_matrix": [
121
+ [
122
+ 0,
123
+ 0,
124
+ 61
125
+ ],
126
+ [
127
+ 0,
128
+ 0,
129
+ 79
130
+ ],
131
+ [
132
+ 1,
133
+ 0,
134
+ 135
135
+ ]
136
+ ]
137
+ },
138
+ "retention": {
139
+ "accuracy": 0.5181,
140
+ "macro_f1": 0.2477,
141
+ "per_label": {
142
+ "ephemeral": {
143
+ "precision": 1.0,
144
+ "recall": 0.0337,
145
+ "f1": 0.0652,
146
+ "support": 89
147
+ },
148
+ "useful": {
149
+ "precision": 0.5128,
150
+ "recall": 1.0,
151
+ "f1": 0.678,
152
+ "support": 140
153
+ },
154
+ "remember": {
155
+ "precision": 0.0,
156
+ "recall": 0.0,
157
+ "f1": 0.0,
158
+ "support": 47
159
+ }
160
+ },
161
+ "confusion_matrix": [
162
+ [
163
+ 3,
164
+ 86,
165
+ 0
166
+ ],
167
+ [
168
+ 0,
169
+ 140,
170
+ 0
171
+ ],
172
+ [
173
+ 0,
174
+ 47,
175
+ 0
176
+ ]
177
+ ]
178
+ },
179
+ "urgency": {
180
+ "accuracy": 0.4928,
181
+ "macro_f1": 0.2201,
182
+ "per_label": {
183
+ "low": {
184
+ "precision": 0.4928,
185
+ "recall": 1.0,
186
+ "f1": 0.6602,
187
+ "support": 136
188
+ },
189
+ "medium": {
190
+ "precision": 0.0,
191
+ "recall": 0.0,
192
+ "f1": 0.0,
193
+ "support": 95
194
+ },
195
+ "high": {
196
+ "precision": 0.0,
197
+ "recall": 0.0,
198
+ "f1": 0.0,
199
+ "support": 45
200
+ }
201
+ },
202
+ "confusion_matrix": [
203
+ [
204
+ 136,
205
+ 0,
206
+ 0
207
+ ],
208
+ [
209
+ 95,
210
+ 0,
211
+ 0
212
+ ],
213
+ [
214
+ 45,
215
+ 0,
216
+ 0
217
+ ]
218
+ ]
219
+ }
220
+ },
221
+ "overall": {
222
+ "exact_match": 0.0616,
223
+ "macro_average_f1": 0.2407,
224
+ "automation_safe_accuracy": 0.0,
225
+ "automation_safe_coverage": 0.0,
226
+ "confidence_threshold": 0.8,
227
+ "confidence_calibration": {
228
+ "ece": 0.463661,
229
+ "bins": [
230
+ {
231
+ "range": [
232
+ 0.4,
233
+ 0.5
234
+ ],
235
+ "count": 58,
236
+ "avg_confidence": 0.481,
237
+ "accuracy": 0.0517
238
+ },
239
+ {
240
+ "range": [
241
+ 0.5,
242
+ 0.6
243
+ ],
244
+ "count": 218,
245
+ "avg_confidence": 0.537,
246
+ "accuracy": 0.0642
247
+ }
248
+ ]
249
+ }
250
+ },
251
+ "training": {
252
+ "epoch": 1,
253
+ "loss": 4.6568
254
+ }
255
+ },
256
+ {
257
+ "per_head": {
258
+ "relation_to_previous": {
259
+ "accuracy": 0.7283,
260
+ "macro_f1": 0.4422,
261
+ "per_label": {
262
+ "new": {
263
+ "precision": 1.0,
264
+ "recall": 1.0,
265
+ "f1": 1.0,
266
+ "support": 102
267
+ },
268
+ "follow_up": {
269
+ "precision": 0.6124,
270
+ "recall": 0.9294,
271
+ "f1": 0.7383,
272
+ "support": 85
273
+ },
274
+ "correction": {
275
+ "precision": 0.0,
276
+ "recall": 0.0,
277
+ "f1": 0.0,
278
+ "support": 29
279
+ },
280
+ "confirmation": {
281
+ "precision": 1.0,
282
+ "recall": 0.2,
283
+ "f1": 0.3333,
284
+ "support": 20
285
+ },
286
+ "cancellation": {
287
+ "precision": 0.0,
288
+ "recall": 0.0,
289
+ "f1": 0.0,
290
+ "support": 21
291
+ },
292
+ "closure": {
293
+ "precision": 0.4444,
294
+ "recall": 0.8421,
295
+ "f1": 0.5818,
296
+ "support": 19
297
+ }
298
+ },
299
+ "confusion_matrix": [
300
+ [
301
+ 102,
302
+ 0,
303
+ 0,
304
+ 0,
305
+ 0,
306
+ 0
307
+ ],
308
+ [
309
+ 0,
310
+ 79,
311
+ 0,
312
+ 0,
313
+ 0,
314
+ 6
315
+ ],
316
+ [
317
+ 0,
318
+ 24,
319
+ 0,
320
+ 0,
321
+ 0,
322
+ 5
323
+ ],
324
+ [
325
+ 0,
326
+ 8,
327
+ 3,
328
+ 4,
329
+ 0,
330
+ 5
331
+ ],
332
+ [
333
+ 0,
334
+ 15,
335
+ 2,
336
+ 0,
337
+ 0,
338
+ 4
339
+ ],
340
+ [
341
+ 0,
342
+ 3,
343
+ 0,
344
+ 0,
345
+ 0,
346
+ 16
347
+ ]
348
+ ]
349
+ },
350
+ "actionability": {
351
+ "accuracy": 0.5688,
352
+ "macro_f1": 0.5548,
353
+ "per_label": {
354
+ "none": {
355
+ "precision": 0.5303,
356
+ "recall": 0.5738,
357
+ "f1": 0.5512,
358
+ "support": 61
359
+ },
360
+ "review": {
361
+ "precision": 0.4124,
362
+ "recall": 0.5063,
363
+ "f1": 0.4545,
364
+ "support": 79
365
+ },
366
+ "act": {
367
+ "precision": 0.7257,
368
+ "recall": 0.6029,
369
+ "f1": 0.6586,
370
+ "support": 136
371
+ }
372
+ },
373
+ "confusion_matrix": [
374
+ [
375
+ 35,
376
+ 18,
377
+ 8
378
+ ],
379
+ [
380
+ 16,
381
+ 40,
382
+ 23
383
+ ],
384
+ [
385
+ 15,
386
+ 39,
387
+ 82
388
+ ]
389
+ ]
390
+ },
391
+ "retention": {
392
+ "accuracy": 0.5254,
393
+ "macro_f1": 0.3733,
394
+ "per_label": {
395
+ "ephemeral": {
396
+ "precision": 0.4845,
397
+ "recall": 0.5281,
398
+ "f1": 0.5054,
399
+ "support": 89
400
+ },
401
+ "useful": {
402
+ "precision": 0.5475,
403
+ "recall": 0.7,
404
+ "f1": 0.6144,
405
+ "support": 140
406
+ },
407
+ "remember": {
408
+ "precision": 0.0,
409
+ "recall": 0.0,
410
+ "f1": 0.0,
411
+ "support": 47
412
+ }
413
+ },
414
+ "confusion_matrix": [
415
+ [
416
+ 47,
417
+ 42,
418
+ 0
419
+ ],
420
+ [
421
+ 42,
422
+ 98,
423
+ 0
424
+ ],
425
+ [
426
+ 8,
427
+ 39,
428
+ 0
429
+ ]
430
+ ]
431
+ },
432
+ "urgency": {
433
+ "accuracy": 0.5362,
434
+ "macro_f1": 0.3555,
435
+ "per_label": {
436
+ "low": {
437
+ "precision": 0.5604,
438
+ "recall": 0.8529,
439
+ "f1": 0.6764,
440
+ "support": 136
441
+ },
442
+ "medium": {
443
+ "precision": 0.4638,
444
+ "recall": 0.3368,
445
+ "f1": 0.3902,
446
+ "support": 95
447
+ },
448
+ "high": {
449
+ "precision": 0.0,
450
+ "recall": 0.0,
451
+ "f1": 0.0,
452
+ "support": 45
453
+ }
454
+ },
455
+ "confusion_matrix": [
456
+ [
457
+ 116,
458
+ 20,
459
+ 0
460
+ ],
461
+ [
462
+ 63,
463
+ 32,
464
+ 0
465
+ ],
466
+ [
467
+ 28,
468
+ 17,
469
+ 0
470
+ ]
471
+ ]
472
+ }
473
+ },
474
+ "overall": {
475
+ "exact_match": 0.1123,
476
+ "macro_average_f1": 0.4314,
477
+ "automation_safe_accuracy": 0.0,
478
+ "automation_safe_coverage": 0.0,
479
+ "confidence_threshold": 0.8,
480
+ "confidence_calibration": {
481
+ "ece": 0.445582,
482
+ "bins": [
483
+ {
484
+ "range": [
485
+ 0.4,
486
+ 0.5
487
+ ],
488
+ "count": 44,
489
+ "avg_confidence": 0.4826,
490
+ "accuracy": 0.0455
491
+ },
492
+ {
493
+ "range": [
494
+ 0.5,
495
+ 0.6
496
+ ],
497
+ "count": 208,
498
+ "avg_confidence": 0.5677,
499
+ "accuracy": 0.125
500
+ },
501
+ {
502
+ "range": [
503
+ 0.6,
504
+ 0.7
505
+ ],
506
+ "count": 24,
507
+ "avg_confidence": 0.6107,
508
+ "accuracy": 0.125
509
+ }
510
+ ]
511
+ }
512
+ },
513
+ "training": {
514
+ "epoch": 2,
515
+ "loss": 3.7776
516
+ }
517
+ },
518
+ {
519
+ "per_head": {
520
+ "relation_to_previous": {
521
+ "accuracy": 0.808,
522
+ "macro_f1": 0.6475,
523
+ "per_label": {
524
+ "new": {
525
+ "precision": 1.0,
526
+ "recall": 1.0,
527
+ "f1": 1.0,
528
+ "support": 102
529
+ },
530
+ "follow_up": {
531
+ "precision": 0.7009,
532
+ "recall": 0.9647,
533
+ "f1": 0.8119,
534
+ "support": 85
535
+ },
536
+ "correction": {
537
+ "precision": 0.5294,
538
+ "recall": 0.3103,
539
+ "f1": 0.3913,
540
+ "support": 29
541
+ },
542
+ "confirmation": {
543
+ "precision": 0.9231,
544
+ "recall": 0.6,
545
+ "f1": 0.7273,
546
+ "support": 20
547
+ },
548
+ "cancellation": {
549
+ "precision": 0.75,
550
+ "recall": 0.1429,
551
+ "f1": 0.24,
552
+ "support": 21
553
+ },
554
+ "closure": {
555
+ "precision": 0.6522,
556
+ "recall": 0.7895,
557
+ "f1": 0.7143,
558
+ "support": 19
559
+ }
560
+ },
561
+ "confusion_matrix": [
562
+ [
563
+ 102,
564
+ 0,
565
+ 0,
566
+ 0,
567
+ 0,
568
+ 0
569
+ ],
570
+ [
571
+ 0,
572
+ 82,
573
+ 1,
574
+ 0,
575
+ 0,
576
+ 2
577
+ ],
578
+ [
579
+ 0,
580
+ 16,
581
+ 9,
582
+ 1,
583
+ 1,
584
+ 2
585
+ ],
586
+ [
587
+ 0,
588
+ 4,
589
+ 0,
590
+ 12,
591
+ 0,
592
+ 4
593
+ ],
594
+ [
595
+ 0,
596
+ 11,
597
+ 7,
598
+ 0,
599
+ 3,
600
+ 0
601
+ ],
602
+ [
603
+ 0,
604
+ 4,
605
+ 0,
606
+ 0,
607
+ 0,
608
+ 15
609
+ ]
610
+ ]
611
+ },
612
+ "actionability": {
613
+ "accuracy": 0.6304,
614
+ "macro_f1": 0.5798,
615
+ "per_label": {
616
+ "none": {
617
+ "precision": 0.6591,
618
+ "recall": 0.4754,
619
+ "f1": 0.5524,
620
+ "support": 61
621
+ },
622
+ "review": {
623
+ "precision": 0.5882,
624
+ "recall": 0.3797,
625
+ "f1": 0.4615,
626
+ "support": 79
627
+ },
628
+ "act": {
629
+ "precision": 0.6354,
630
+ "recall": 0.8456,
631
+ "f1": 0.7256,
632
+ "support": 136
633
+ }
634
+ },
635
+ "confusion_matrix": [
636
+ [
637
+ 29,
638
+ 7,
639
+ 25
640
+ ],
641
+ [
642
+ 8,
643
+ 30,
644
+ 41
645
+ ],
646
+ [
647
+ 7,
648
+ 14,
649
+ 115
650
+ ]
651
+ ]
652
+ },
653
+ "retention": {
654
+ "accuracy": 0.6703,
655
+ "macro_f1": 0.6498,
656
+ "per_label": {
657
+ "ephemeral": {
658
+ "precision": 0.661,
659
+ "recall": 0.4382,
660
+ "f1": 0.527,
661
+ "support": 89
662
+ },
663
+ "useful": {
664
+ "precision": 0.6398,
665
+ "recall": 0.85,
666
+ "f1": 0.7301,
667
+ "support": 140
668
+ },
669
+ "remember": {
670
+ "precision": 0.871,
671
+ "recall": 0.5745,
672
+ "f1": 0.6923,
673
+ "support": 47
674
+ }
675
+ },
676
+ "confusion_matrix": [
677
+ [
678
+ 39,
679
+ 49,
680
+ 1
681
+ ],
682
+ [
683
+ 18,
684
+ 119,
685
+ 3
686
+ ],
687
+ [
688
+ 2,
689
+ 18,
690
+ 27
691
+ ]
692
+ ]
693
+ },
694
+ "urgency": {
695
+ "accuracy": 0.5688,
696
+ "macro_f1": 0.4235,
697
+ "per_label": {
698
+ "low": {
699
+ "precision": 0.694,
700
+ "recall": 0.6838,
701
+ "f1": 0.6889,
702
+ "support": 136
703
+ },
704
+ "medium": {
705
+ "precision": 0.4565,
706
+ "recall": 0.6632,
707
+ "f1": 0.5408,
708
+ "support": 95
709
+ },
710
+ "high": {
711
+ "precision": 0.25,
712
+ "recall": 0.0222,
713
+ "f1": 0.0408,
714
+ "support": 45
715
+ }
716
+ },
717
+ "confusion_matrix": [
718
+ [
719
+ 93,
720
+ 42,
721
+ 1
722
+ ],
723
+ [
724
+ 30,
725
+ 63,
726
+ 2
727
+ ],
728
+ [
729
+ 11,
730
+ 33,
731
+ 1
732
+ ]
733
+ ]
734
+ }
735
+ },
736
+ "overall": {
737
+ "exact_match": 0.2101,
738
+ "macro_average_f1": 0.5752,
739
+ "automation_safe_accuracy": 0.0,
740
+ "automation_safe_coverage": 0.0,
741
+ "confidence_threshold": 0.8,
742
+ "confidence_calibration": {
743
+ "ece": 0.396414,
744
+ "bins": [
745
+ {
746
+ "range": [
747
+ 0.4,
748
+ 0.5
749
+ ],
750
+ "count": 10,
751
+ "avg_confidence": 0.4772,
752
+ "accuracy": 0.2
753
+ },
754
+ {
755
+ "range": [
756
+ 0.5,
757
+ 0.6
758
+ ],
759
+ "count": 128,
760
+ "avg_confidence": 0.5669,
761
+ "accuracy": 0.1484
762
+ },
763
+ {
764
+ "range": [
765
+ 0.6,
766
+ 0.7
767
+ ],
768
+ "count": 129,
769
+ "avg_confidence": 0.64,
770
+ "accuracy": 0.2326
771
+ },
772
+ {
773
+ "range": [
774
+ 0.7,
775
+ 0.8
776
+ ],
777
+ "count": 9,
778
+ "avg_confidence": 0.7196,
779
+ "accuracy": 0.7778
780
+ }
781
+ ]
782
+ }
783
+ },
784
+ "training": {
785
+ "epoch": 3,
786
+ "loss": 3.3415
787
+ }
788
+ },
789
+ {
790
+ "per_head": {
791
+ "relation_to_previous": {
792
+ "accuracy": 0.8333,
793
+ "macro_f1": 0.694,
794
+ "per_label": {
795
+ "new": {
796
+ "precision": 1.0,
797
+ "recall": 1.0,
798
+ "f1": 1.0,
799
+ "support": 102
800
+ },
801
+ "follow_up": {
802
+ "precision": 0.8571,
803
+ "recall": 0.9176,
804
+ "f1": 0.8864,
805
+ "support": 85
806
+ },
807
+ "correction": {
808
+ "precision": 0.5652,
809
+ "recall": 0.4483,
810
+ "f1": 0.5,
811
+ "support": 29
812
+ },
813
+ "confirmation": {
814
+ "precision": 0.8,
815
+ "recall": 0.6,
816
+ "f1": 0.6857,
817
+ "support": 20
818
+ },
819
+ "cancellation": {
820
+ "precision": 0.5455,
821
+ "recall": 0.2857,
822
+ "f1": 0.375,
823
+ "support": 21
824
+ },
825
+ "closure": {
826
+ "precision": 0.5588,
827
+ "recall": 1.0,
828
+ "f1": 0.717,
829
+ "support": 19
830
+ }
831
+ },
832
+ "confusion_matrix": [
833
+ [
834
+ 102,
835
+ 0,
836
+ 0,
837
+ 0,
838
+ 0,
839
+ 0
840
+ ],
841
+ [
842
+ 0,
843
+ 78,
844
+ 3,
845
+ 0,
846
+ 1,
847
+ 3
848
+ ],
849
+ [
850
+ 0,
851
+ 7,
852
+ 13,
853
+ 1,
854
+ 4,
855
+ 4
856
+ ],
857
+ [
858
+ 0,
859
+ 2,
860
+ 0,
861
+ 12,
862
+ 0,
863
+ 6
864
+ ],
865
+ [
866
+ 0,
867
+ 4,
868
+ 7,
869
+ 2,
870
+ 6,
871
+ 2
872
+ ],
873
+ [
874
+ 0,
875
+ 0,
876
+ 0,
877
+ 0,
878
+ 0,
879
+ 19
880
+ ]
881
+ ]
882
+ },
883
+ "actionability": {
884
+ "accuracy": 0.6486,
885
+ "macro_f1": 0.6252,
886
+ "per_label": {
887
+ "none": {
888
+ "precision": 0.5634,
889
+ "recall": 0.6557,
890
+ "f1": 0.6061,
891
+ "support": 61
892
+ },
893
+ "review": {
894
+ "precision": 0.5882,
895
+ "recall": 0.5063,
896
+ "f1": 0.5442,
897
+ "support": 79
898
+ },
899
+ "act": {
900
+ "precision": 0.7226,
901
+ "recall": 0.7279,
902
+ "f1": 0.7253,
903
+ "support": 136
904
+ }
905
+ },
906
+ "confusion_matrix": [
907
+ [
908
+ 40,
909
+ 8,
910
+ 13
911
+ ],
912
+ [
913
+ 14,
914
+ 40,
915
+ 25
916
+ ],
917
+ [
918
+ 17,
919
+ 20,
920
+ 99
921
+ ]
922
+ ]
923
+ },
924
+ "retention": {
925
+ "accuracy": 0.6703,
926
+ "macro_f1": 0.6542,
927
+ "per_label": {
928
+ "ephemeral": {
929
+ "precision": 0.6067,
930
+ "recall": 0.6067,
931
+ "f1": 0.6067,
932
+ "support": 89
933
+ },
934
+ "useful": {
935
+ "precision": 0.673,
936
+ "recall": 0.7643,
937
+ "f1": 0.7157,
938
+ "support": 140
939
+ },
940
+ "remember": {
941
+ "precision": 0.8571,
942
+ "recall": 0.5106,
943
+ "f1": 0.64,
944
+ "support": 47
945
+ }
946
+ },
947
+ "confusion_matrix": [
948
+ [
949
+ 54,
950
+ 35,
951
+ 0
952
+ ],
953
+ [
954
+ 29,
955
+ 107,
956
+ 4
957
+ ],
958
+ [
959
+ 6,
960
+ 17,
961
+ 24
962
+ ]
963
+ ]
964
+ },
965
+ "urgency": {
966
+ "accuracy": 0.5906,
967
+ "macro_f1": 0.4633,
968
+ "per_label": {
969
+ "low": {
970
+ "precision": 0.6477,
971
+ "recall": 0.8382,
972
+ "f1": 0.7308,
973
+ "support": 136
974
+ },
975
+ "medium": {
976
+ "precision": 0.4783,
977
+ "recall": 0.4632,
978
+ "f1": 0.4706,
979
+ "support": 95
980
+ },
981
+ "high": {
982
+ "precision": 0.625,
983
+ "recall": 0.1111,
984
+ "f1": 0.1887,
985
+ "support": 45
986
+ }
987
+ },
988
+ "confusion_matrix": [
989
+ [
990
+ 114,
991
+ 21,
992
+ 1
993
+ ],
994
+ [
995
+ 49,
996
+ 44,
997
+ 2
998
+ ],
999
+ [
1000
+ 13,
1001
+ 27,
1002
+ 5
1003
+ ]
1004
+ ]
1005
+ }
1006
+ },
1007
+ "overall": {
1008
+ "exact_match": 0.2319,
1009
+ "macro_average_f1": 0.6092,
1010
+ "automation_safe_accuracy": 0.0,
1011
+ "automation_safe_coverage": 0.0,
1012
+ "confidence_threshold": 0.8,
1013
+ "confidence_calibration": {
1014
+ "ece": 0.402291,
1015
+ "bins": [
1016
+ {
1017
+ "range": [
1018
+ 0.4,
1019
+ 0.5
1020
+ ],
1021
+ "count": 4,
1022
+ "avg_confidence": 0.4762,
1023
+ "accuracy": 0.0
1024
+ },
1025
+ {
1026
+ "range": [
1027
+ 0.5,
1028
+ 0.6
1029
+ ],
1030
+ "count": 74,
1031
+ "avg_confidence": 0.563,
1032
+ "accuracy": 0.1486
1033
+ },
1034
+ {
1035
+ "range": [
1036
+ 0.6,
1037
+ 0.7
1038
+ ],
1039
+ "count": 159,
1040
+ "avg_confidence": 0.6473,
1041
+ "accuracy": 0.2138
1042
+ },
1043
+ {
1044
+ "range": [
1045
+ 0.7,
1046
+ 0.8
1047
+ ],
1048
+ "count": 39,
1049
+ "avg_confidence": 0.7317,
1050
+ "accuracy": 0.4872
1051
+ }
1052
+ ]
1053
+ }
1054
+ },
1055
+ "training": {
1056
+ "epoch": 4,
1057
+ "loss": 2.9715
1058
+ }
1059
+ },
1060
+ {
1061
+ "per_head": {
1062
+ "relation_to_previous": {
1063
+ "accuracy": 0.8442,
1064
+ "macro_f1": 0.7157,
1065
+ "per_label": {
1066
+ "new": {
1067
+ "precision": 1.0,
1068
+ "recall": 1.0,
1069
+ "f1": 1.0,
1070
+ "support": 102
1071
+ },
1072
+ "follow_up": {
1073
+ "precision": 0.8571,
1074
+ "recall": 0.9176,
1075
+ "f1": 0.8864,
1076
+ "support": 85
1077
+ },
1078
+ "correction": {
1079
+ "precision": 0.5926,
1080
+ "recall": 0.5517,
1081
+ "f1": 0.5714,
1082
+ "support": 29
1083
+ },
1084
+ "confirmation": {
1085
+ "precision": 0.8571,
1086
+ "recall": 0.6,
1087
+ "f1": 0.7059,
1088
+ "support": 20
1089
+ },
1090
+ "cancellation": {
1091
+ "precision": 0.6667,
1092
+ "recall": 0.2857,
1093
+ "f1": 0.4,
1094
+ "support": 21
1095
+ },
1096
+ "closure": {
1097
+ "precision": 0.5758,
1098
+ "recall": 1.0,
1099
+ "f1": 0.7308,
1100
+ "support": 19
1101
+ }
1102
+ },
1103
+ "confusion_matrix": [
1104
+ [
1105
+ 102,
1106
+ 0,
1107
+ 0,
1108
+ 0,
1109
+ 0,
1110
+ 0
1111
+ ],
1112
+ [
1113
+ 0,
1114
+ 78,
1115
+ 3,
1116
+ 0,
1117
+ 1,
1118
+ 3
1119
+ ],
1120
+ [
1121
+ 0,
1122
+ 7,
1123
+ 16,
1124
+ 1,
1125
+ 2,
1126
+ 3
1127
+ ],
1128
+ [
1129
+ 0,
1130
+ 2,
1131
+ 0,
1132
+ 12,
1133
+ 0,
1134
+ 6
1135
+ ],
1136
+ [
1137
+ 0,
1138
+ 4,
1139
+ 8,
1140
+ 1,
1141
+ 6,
1142
+ 2
1143
+ ],
1144
+ [
1145
+ 0,
1146
+ 0,
1147
+ 0,
1148
+ 0,
1149
+ 0,
1150
+ 19
1151
+ ]
1152
+ ]
1153
+ },
1154
+ "actionability": {
1155
+ "accuracy": 0.6558,
1156
+ "macro_f1": 0.6342,
1157
+ "per_label": {
1158
+ "none": {
1159
+ "precision": 0.5972,
1160
+ "recall": 0.7049,
1161
+ "f1": 0.6466,
1162
+ "support": 61
1163
+ },
1164
+ "review": {
1165
+ "precision": 0.5493,
1166
+ "recall": 0.4937,
1167
+ "f1": 0.52,
1168
+ "support": 79
1169
+ },
1170
+ "act": {
1171
+ "precision": 0.7444,
1172
+ "recall": 0.7279,
1173
+ "f1": 0.7361,
1174
+ "support": 136
1175
+ }
1176
+ },
1177
+ "confusion_matrix": [
1178
+ [
1179
+ 43,
1180
+ 12,
1181
+ 6
1182
+ ],
1183
+ [
1184
+ 12,
1185
+ 39,
1186
+ 28
1187
+ ],
1188
+ [
1189
+ 17,
1190
+ 20,
1191
+ 99
1192
+ ]
1193
+ ]
1194
+ },
1195
+ "retention": {
1196
+ "accuracy": 0.6703,
1197
+ "macro_f1": 0.6666,
1198
+ "per_label": {
1199
+ "ephemeral": {
1200
+ "precision": 0.5816,
1201
+ "recall": 0.6404,
1202
+ "f1": 0.6096,
1203
+ "support": 89
1204
+ },
1205
+ "useful": {
1206
+ "precision": 0.6846,
1207
+ "recall": 0.7286,
1208
+ "f1": 0.7059,
1209
+ "support": 140
1210
+ },
1211
+ "remember": {
1212
+ "precision": 0.8966,
1213
+ "recall": 0.5532,
1214
+ "f1": 0.6842,
1215
+ "support": 47
1216
+ }
1217
+ },
1218
+ "confusion_matrix": [
1219
+ [
1220
+ 57,
1221
+ 32,
1222
+ 0
1223
+ ],
1224
+ [
1225
+ 35,
1226
+ 102,
1227
+ 3
1228
+ ],
1229
+ [
1230
+ 6,
1231
+ 15,
1232
+ 26
1233
+ ]
1234
+ ]
1235
+ },
1236
+ "urgency": {
1237
+ "accuracy": 0.5978,
1238
+ "macro_f1": 0.4845,
1239
+ "per_label": {
1240
+ "low": {
1241
+ "precision": 0.6948,
1242
+ "recall": 0.7868,
1243
+ "f1": 0.7379,
1244
+ "support": 136
1245
+ },
1246
+ "medium": {
1247
+ "precision": 0.4815,
1248
+ "recall": 0.5474,
1249
+ "f1": 0.5123,
1250
+ "support": 95
1251
+ },
1252
+ "high": {
1253
+ "precision": 0.4286,
1254
+ "recall": 0.1333,
1255
+ "f1": 0.2034,
1256
+ "support": 45
1257
+ }
1258
+ },
1259
+ "confusion_matrix": [
1260
+ [
1261
+ 107,
1262
+ 27,
1263
+ 2
1264
+ ],
1265
+ [
1266
+ 37,
1267
+ 52,
1268
+ 6
1269
+ ],
1270
+ [
1271
+ 10,
1272
+ 29,
1273
+ 6
1274
+ ]
1275
+ ]
1276
+ }
1277
+ },
1278
+ "overall": {
1279
+ "exact_match": 0.25,
1280
+ "macro_average_f1": 0.6252,
1281
+ "automation_safe_accuracy": 0.8,
1282
+ "automation_safe_coverage": 0.0181,
1283
+ "confidence_threshold": 0.8,
1284
+ "confidence_calibration": {
1285
+ "ece": 0.407114,
1286
+ "bins": [
1287
+ {
1288
+ "range": [
1289
+ 0.4,
1290
+ 0.5
1291
+ ],
1292
+ "count": 1,
1293
+ "avg_confidence": 0.4972,
1294
+ "accuracy": 0.0
1295
+ },
1296
+ {
1297
+ "range": [
1298
+ 0.5,
1299
+ 0.6
1300
+ ],
1301
+ "count": 55,
1302
+ "avg_confidence": 0.5704,
1303
+ "accuracy": 0.1818
1304
+ },
1305
+ {
1306
+ "range": [
1307
+ 0.6,
1308
+ 0.7
1309
+ ],
1310
+ "count": 143,
1311
+ "avg_confidence": 0.6475,
1312
+ "accuracy": 0.1538
1313
+ },
1314
+ {
1315
+ "range": [
1316
+ 0.7,
1317
+ 0.8
1318
+ ],
1319
+ "count": 72,
1320
+ "avg_confidence": 0.7343,
1321
+ "accuracy": 0.4583
1322
+ },
1323
+ {
1324
+ "range": [
1325
+ 0.8,
1326
+ 0.9
1327
+ ],
1328
+ "count": 5,
1329
+ "avg_confidence": 0.8067,
1330
+ "accuracy": 0.8
1331
+ }
1332
+ ]
1333
+ }
1334
+ },
1335
+ "training": {
1336
+ "epoch": 5,
1337
+ "loss": 2.7301
1338
+ }
1339
+ },
1340
+ {
1341
+ "per_head": {
1342
+ "relation_to_previous": {
1343
+ "accuracy": 0.8587,
1344
+ "macro_f1": 0.7646,
1345
+ "per_label": {
1346
+ "new": {
1347
+ "precision": 1.0,
1348
+ "recall": 1.0,
1349
+ "f1": 1.0,
1350
+ "support": 102
1351
+ },
1352
+ "follow_up": {
1353
+ "precision": 0.7921,
1354
+ "recall": 0.9412,
1355
+ "f1": 0.8602,
1356
+ "support": 85
1357
+ },
1358
+ "correction": {
1359
+ "precision": 0.64,
1360
+ "recall": 0.5517,
1361
+ "f1": 0.5926,
1362
+ "support": 29
1363
+ },
1364
+ "confirmation": {
1365
+ "precision": 0.8125,
1366
+ "recall": 0.65,
1367
+ "f1": 0.7222,
1368
+ "support": 20
1369
+ },
1370
+ "cancellation": {
1371
+ "precision": 0.8182,
1372
+ "recall": 0.4286,
1373
+ "f1": 0.5625,
1374
+ "support": 21
1375
+ },
1376
+ "closure": {
1377
+ "precision": 0.8095,
1378
+ "recall": 0.8947,
1379
+ "f1": 0.85,
1380
+ "support": 19
1381
+ }
1382
+ },
1383
+ "confusion_matrix": [
1384
+ [
1385
+ 102,
1386
+ 0,
1387
+ 0,
1388
+ 0,
1389
+ 0,
1390
+ 0
1391
+ ],
1392
+ [
1393
+ 0,
1394
+ 80,
1395
+ 4,
1396
+ 0,
1397
+ 0,
1398
+ 1
1399
+ ],
1400
+ [
1401
+ 0,
1402
+ 8,
1403
+ 16,
1404
+ 2,
1405
+ 2,
1406
+ 1
1407
+ ],
1408
+ [
1409
+ 0,
1410
+ 4,
1411
+ 1,
1412
+ 13,
1413
+ 0,
1414
+ 2
1415
+ ],
1416
+ [
1417
+ 0,
1418
+ 7,
1419
+ 4,
1420
+ 1,
1421
+ 9,
1422
+ 0
1423
+ ],
1424
+ [
1425
+ 0,
1426
+ 2,
1427
+ 0,
1428
+ 0,
1429
+ 0,
1430
+ 17
1431
+ ]
1432
+ ]
1433
+ },
1434
+ "actionability": {
1435
+ "accuracy": 0.6884,
1436
+ "macro_f1": 0.6666,
1437
+ "per_label": {
1438
+ "none": {
1439
+ "precision": 0.6333,
1440
+ "recall": 0.623,
1441
+ "f1": 0.6281,
1442
+ "support": 61
1443
+ },
1444
+ "review": {
1445
+ "precision": 0.5976,
1446
+ "recall": 0.6203,
1447
+ "f1": 0.6087,
1448
+ "support": 79
1449
+ },
1450
+ "act": {
1451
+ "precision": 0.7687,
1452
+ "recall": 0.7574,
1453
+ "f1": 0.763,
1454
+ "support": 136
1455
+ }
1456
+ },
1457
+ "confusion_matrix": [
1458
+ [
1459
+ 38,
1460
+ 14,
1461
+ 9
1462
+ ],
1463
+ [
1464
+ 8,
1465
+ 49,
1466
+ 22
1467
+ ],
1468
+ [
1469
+ 14,
1470
+ 19,
1471
+ 103
1472
+ ]
1473
+ ]
1474
+ },
1475
+ "retention": {
1476
+ "accuracy": 0.6703,
1477
+ "macro_f1": 0.6452,
1478
+ "per_label": {
1479
+ "ephemeral": {
1480
+ "precision": 0.6,
1481
+ "recall": 0.6067,
1482
+ "f1": 0.6034,
1483
+ "support": 89
1484
+ },
1485
+ "useful": {
1486
+ "precision": 0.6707,
1487
+ "recall": 0.7857,
1488
+ "f1": 0.7237,
1489
+ "support": 140
1490
+ },
1491
+ "remember": {
1492
+ "precision": 0.9545,
1493
+ "recall": 0.4468,
1494
+ "f1": 0.6087,
1495
+ "support": 47
1496
+ }
1497
+ },
1498
+ "confusion_matrix": [
1499
+ [
1500
+ 54,
1501
+ 35,
1502
+ 0
1503
+ ],
1504
+ [
1505
+ 29,
1506
+ 110,
1507
+ 1
1508
+ ],
1509
+ [
1510
+ 7,
1511
+ 19,
1512
+ 21
1513
+ ]
1514
+ ]
1515
+ },
1516
+ "urgency": {
1517
+ "accuracy": 0.6196,
1518
+ "macro_f1": 0.5411,
1519
+ "per_label": {
1520
+ "low": {
1521
+ "precision": 0.7576,
1522
+ "recall": 0.7353,
1523
+ "f1": 0.7463,
1524
+ "support": 136
1525
+ },
1526
+ "medium": {
1527
+ "precision": 0.5,
1528
+ "recall": 0.6316,
1529
+ "f1": 0.5581,
1530
+ "support": 95
1531
+ },
1532
+ "high": {
1533
+ "precision": 0.4583,
1534
+ "recall": 0.2444,
1535
+ "f1": 0.3188,
1536
+ "support": 45
1537
+ }
1538
+ },
1539
+ "confusion_matrix": [
1540
+ [
1541
+ 100,
1542
+ 32,
1543
+ 4
1544
+ ],
1545
+ [
1546
+ 26,
1547
+ 60,
1548
+ 9
1549
+ ],
1550
+ [
1551
+ 6,
1552
+ 28,
1553
+ 11
1554
+ ]
1555
+ ]
1556
+ }
1557
+ },
1558
+ "overall": {
1559
+ "exact_match": 0.308,
1560
+ "macro_average_f1": 0.6544,
1561
+ "automation_safe_accuracy": 1.0,
1562
+ "automation_safe_coverage": 0.0072,
1563
+ "confidence_threshold": 0.8,
1564
+ "confidence_calibration": {
1565
+ "ece": 0.360595,
1566
+ "bins": [
1567
+ {
1568
+ "range": [
1569
+ 0.5,
1570
+ 0.6
1571
+ ],
1572
+ "count": 43,
1573
+ "avg_confidence": 0.5795,
1574
+ "accuracy": 0.1628
1575
+ },
1576
+ {
1577
+ "range": [
1578
+ 0.6,
1579
+ 0.7
1580
+ ],
1581
+ "count": 154,
1582
+ "avg_confidence": 0.6525,
1583
+ "accuracy": 0.2338
1584
+ },
1585
+ {
1586
+ "range": [
1587
+ 0.7,
1588
+ 0.8
1589
+ ],
1590
+ "count": 77,
1591
+ "avg_confidence": 0.737,
1592
+ "accuracy": 0.5195
1593
+ },
1594
+ {
1595
+ "range": [
1596
+ 0.8,
1597
+ 0.9
1598
+ ],
1599
+ "count": 2,
1600
+ "avg_confidence": 0.8115,
1601
+ "accuracy": 1.0
1602
+ }
1603
+ ]
1604
+ }
1605
+ },
1606
+ "training": {
1607
+ "epoch": 6,
1608
+ "loss": 2.5877
1609
+ }
1610
+ },
1611
+ {
1612
+ "per_head": {
1613
+ "relation_to_previous": {
1614
+ "accuracy": 0.8659,
1615
+ "macro_f1": 0.7757,
1616
+ "per_label": {
1617
+ "new": {
1618
+ "precision": 1.0,
1619
+ "recall": 1.0,
1620
+ "f1": 1.0,
1621
+ "support": 102
1622
+ },
1623
+ "follow_up": {
1624
+ "precision": 0.8387,
1625
+ "recall": 0.9176,
1626
+ "f1": 0.8764,
1627
+ "support": 85
1628
+ },
1629
+ "correction": {
1630
+ "precision": 0.6296,
1631
+ "recall": 0.5862,
1632
+ "f1": 0.6071,
1633
+ "support": 29
1634
+ },
1635
+ "confirmation": {
1636
+ "precision": 0.875,
1637
+ "recall": 0.7,
1638
+ "f1": 0.7778,
1639
+ "support": 20
1640
+ },
1641
+ "cancellation": {
1642
+ "precision": 0.6923,
1643
+ "recall": 0.4286,
1644
+ "f1": 0.5294,
1645
+ "support": 21
1646
+ },
1647
+ "closure": {
1648
+ "precision": 0.76,
1649
+ "recall": 1.0,
1650
+ "f1": 0.8636,
1651
+ "support": 19
1652
+ }
1653
+ },
1654
+ "confusion_matrix": [
1655
+ [
1656
+ 102,
1657
+ 0,
1658
+ 0,
1659
+ 0,
1660
+ 0,
1661
+ 0
1662
+ ],
1663
+ [
1664
+ 0,
1665
+ 78,
1666
+ 3,
1667
+ 0,
1668
+ 2,
1669
+ 2
1670
+ ],
1671
+ [
1672
+ 0,
1673
+ 8,
1674
+ 17,
1675
+ 1,
1676
+ 2,
1677
+ 1
1678
+ ],
1679
+ [
1680
+ 0,
1681
+ 4,
1682
+ 0,
1683
+ 14,
1684
+ 0,
1685
+ 2
1686
+ ],
1687
+ [
1688
+ 0,
1689
+ 3,
1690
+ 7,
1691
+ 1,
1692
+ 9,
1693
+ 1
1694
+ ],
1695
+ [
1696
+ 0,
1697
+ 0,
1698
+ 0,
1699
+ 0,
1700
+ 0,
1701
+ 19
1702
+ ]
1703
+ ]
1704
+ },
1705
+ "actionability": {
1706
+ "accuracy": 0.6812,
1707
+ "macro_f1": 0.6558,
1708
+ "per_label": {
1709
+ "none": {
1710
+ "precision": 0.6032,
1711
+ "recall": 0.623,
1712
+ "f1": 0.6129,
1713
+ "support": 61
1714
+ },
1715
+ "review": {
1716
+ "precision": 0.5974,
1717
+ "recall": 0.5823,
1718
+ "f1": 0.5897,
1719
+ "support": 79
1720
+ },
1721
+ "act": {
1722
+ "precision": 0.7647,
1723
+ "recall": 0.7647,
1724
+ "f1": 0.7647,
1725
+ "support": 136
1726
+ }
1727
+ },
1728
+ "confusion_matrix": [
1729
+ [
1730
+ 38,
1731
+ 15,
1732
+ 8
1733
+ ],
1734
+ [
1735
+ 9,
1736
+ 46,
1737
+ 24
1738
+ ],
1739
+ [
1740
+ 16,
1741
+ 16,
1742
+ 104
1743
+ ]
1744
+ ]
1745
+ },
1746
+ "retention": {
1747
+ "accuracy": 0.6848,
1748
+ "macro_f1": 0.6739,
1749
+ "per_label": {
1750
+ "ephemeral": {
1751
+ "precision": 0.6235,
1752
+ "recall": 0.5955,
1753
+ "f1": 0.6092,
1754
+ "support": 89
1755
+ },
1756
+ "useful": {
1757
+ "precision": 0.6855,
1758
+ "recall": 0.7786,
1759
+ "f1": 0.7291,
1760
+ "support": 140
1761
+ },
1762
+ "remember": {
1763
+ "precision": 0.8438,
1764
+ "recall": 0.5745,
1765
+ "f1": 0.6835,
1766
+ "support": 47
1767
+ }
1768
+ },
1769
+ "confusion_matrix": [
1770
+ [
1771
+ 53,
1772
+ 35,
1773
+ 1
1774
+ ],
1775
+ [
1776
+ 27,
1777
+ 109,
1778
+ 4
1779
+ ],
1780
+ [
1781
+ 5,
1782
+ 15,
1783
+ 27
1784
+ ]
1785
+ ]
1786
+ },
1787
+ "urgency": {
1788
+ "accuracy": 0.6449,
1789
+ "macro_f1": 0.5761,
1790
+ "per_label": {
1791
+ "low": {
1792
+ "precision": 0.75,
1793
+ "recall": 0.7721,
1794
+ "f1": 0.7609,
1795
+ "support": 136
1796
+ },
1797
+ "medium": {
1798
+ "precision": 0.5413,
1799
+ "recall": 0.6211,
1800
+ "f1": 0.5784,
1801
+ "support": 95
1802
+ },
1803
+ "high": {
1804
+ "precision": 0.5185,
1805
+ "recall": 0.3111,
1806
+ "f1": 0.3889,
1807
+ "support": 45
1808
+ }
1809
+ },
1810
+ "confusion_matrix": [
1811
+ [
1812
+ 105,
1813
+ 27,
1814
+ 4
1815
+ ],
1816
+ [
1817
+ 27,
1818
+ 59,
1819
+ 9
1820
+ ],
1821
+ [
1822
+ 8,
1823
+ 23,
1824
+ 14
1825
+ ]
1826
+ ]
1827
+ }
1828
+ },
1829
+ "overall": {
1830
+ "exact_match": 0.3116,
1831
+ "macro_average_f1": 0.6704,
1832
+ "automation_safe_accuracy": 0.8,
1833
+ "automation_safe_coverage": 0.0362,
1834
+ "confidence_threshold": 0.8,
1835
+ "confidence_calibration": {
1836
+ "ece": 0.370519,
1837
+ "bins": [
1838
+ {
1839
+ "range": [
1840
+ 0.5,
1841
+ 0.6
1842
+ ],
1843
+ "count": 24,
1844
+ "avg_confidence": 0.5696,
1845
+ "accuracy": 0.25
1846
+ },
1847
+ {
1848
+ "range": [
1849
+ 0.6,
1850
+ 0.7
1851
+ ],
1852
+ "count": 141,
1853
+ "avg_confidence": 0.6522,
1854
+ "accuracy": 0.2057
1855
+ },
1856
+ {
1857
+ "range": [
1858
+ 0.7,
1859
+ 0.8
1860
+ ],
1861
+ "count": 101,
1862
+ "avg_confidence": 0.7378,
1863
+ "accuracy": 0.4257
1864
+ },
1865
+ {
1866
+ "range": [
1867
+ 0.8,
1868
+ 0.9
1869
+ ],
1870
+ "count": 10,
1871
+ "avg_confidence": 0.8123,
1872
+ "accuracy": 0.8
1873
+ }
1874
+ ]
1875
+ }
1876
+ },
1877
+ "training": {
1878
+ "epoch": 7,
1879
+ "loss": 2.4515
1880
+ }
1881
+ },
1882
+ {
1883
+ "per_head": {
1884
+ "relation_to_previous": {
1885
+ "accuracy": 0.8768,
1886
+ "macro_f1": 0.7893,
1887
+ "per_label": {
1888
+ "new": {
1889
+ "precision": 1.0,
1890
+ "recall": 1.0,
1891
+ "f1": 1.0,
1892
+ "support": 102
1893
+ },
1894
+ "follow_up": {
1895
+ "precision": 0.8764,
1896
+ "recall": 0.9176,
1897
+ "f1": 0.8966,
1898
+ "support": 85
1899
+ },
1900
+ "correction": {
1901
+ "precision": 0.6786,
1902
+ "recall": 0.6552,
1903
+ "f1": 0.6667,
1904
+ "support": 29
1905
+ },
1906
+ "confirmation": {
1907
+ "precision": 0.8235,
1908
+ "recall": 0.7,
1909
+ "f1": 0.7568,
1910
+ "support": 20
1911
+ },
1912
+ "cancellation": {
1913
+ "precision": 0.7143,
1914
+ "recall": 0.4762,
1915
+ "f1": 0.5714,
1916
+ "support": 21
1917
+ },
1918
+ "closure": {
1919
+ "precision": 0.7308,
1920
+ "recall": 1.0,
1921
+ "f1": 0.8444,
1922
+ "support": 19
1923
+ }
1924
+ },
1925
+ "confusion_matrix": [
1926
+ [
1927
+ 102,
1928
+ 0,
1929
+ 0,
1930
+ 0,
1931
+ 0,
1932
+ 0
1933
+ ],
1934
+ [
1935
+ 0,
1936
+ 78,
1937
+ 3,
1938
+ 0,
1939
+ 2,
1940
+ 2
1941
+ ],
1942
+ [
1943
+ 0,
1944
+ 5,
1945
+ 19,
1946
+ 2,
1947
+ 2,
1948
+ 1
1949
+ ],
1950
+ [
1951
+ 0,
1952
+ 3,
1953
+ 0,
1954
+ 14,
1955
+ 0,
1956
+ 3
1957
+ ],
1958
+ [
1959
+ 0,
1960
+ 3,
1961
+ 6,
1962
+ 1,
1963
+ 10,
1964
+ 1
1965
+ ],
1966
+ [
1967
+ 0,
1968
+ 0,
1969
+ 0,
1970
+ 0,
1971
+ 0,
1972
+ 19
1973
+ ]
1974
+ ]
1975
+ },
1976
+ "actionability": {
1977
+ "accuracy": 0.7101,
1978
+ "macro_f1": 0.6834,
1979
+ "per_label": {
1980
+ "none": {
1981
+ "precision": 0.6786,
1982
+ "recall": 0.623,
1983
+ "f1": 0.6496,
1984
+ "support": 61
1985
+ },
1986
+ "review": {
1987
+ "precision": 0.6571,
1988
+ "recall": 0.5823,
1989
+ "f1": 0.6174,
1990
+ "support": 79
1991
+ },
1992
+ "act": {
1993
+ "precision": 0.7467,
1994
+ "recall": 0.8235,
1995
+ "f1": 0.7832,
1996
+ "support": 136
1997
+ }
1998
+ },
1999
+ "confusion_matrix": [
2000
+ [
2001
+ 38,
2002
+ 13,
2003
+ 10
2004
+ ],
2005
+ [
2006
+ 5,
2007
+ 46,
2008
+ 28
2009
+ ],
2010
+ [
2011
+ 13,
2012
+ 11,
2013
+ 112
2014
+ ]
2015
+ ]
2016
+ },
2017
+ "retention": {
2018
+ "accuracy": 0.7029,
2019
+ "macro_f1": 0.6849,
2020
+ "per_label": {
2021
+ "ephemeral": {
2022
+ "precision": 0.6628,
2023
+ "recall": 0.6404,
2024
+ "f1": 0.6514,
2025
+ "support": 89
2026
+ },
2027
+ "useful": {
2028
+ "precision": 0.7025,
2029
+ "recall": 0.7929,
2030
+ "f1": 0.745,
2031
+ "support": 140
2032
+ },
2033
+ "remember": {
2034
+ "precision": 0.8125,
2035
+ "recall": 0.5532,
2036
+ "f1": 0.6582,
2037
+ "support": 47
2038
+ }
2039
+ },
2040
+ "confusion_matrix": [
2041
+ [
2042
+ 57,
2043
+ 31,
2044
+ 1
2045
+ ],
2046
+ [
2047
+ 24,
2048
+ 111,
2049
+ 5
2050
+ ],
2051
+ [
2052
+ 5,
2053
+ 16,
2054
+ 26
2055
+ ]
2056
+ ]
2057
+ },
2058
+ "urgency": {
2059
+ "accuracy": 0.6449,
2060
+ "macro_f1": 0.5777,
2061
+ "per_label": {
2062
+ "low": {
2063
+ "precision": 0.7536,
2064
+ "recall": 0.7647,
2065
+ "f1": 0.7591,
2066
+ "support": 136
2067
+ },
2068
+ "medium": {
2069
+ "precision": 0.5357,
2070
+ "recall": 0.6316,
2071
+ "f1": 0.5797,
2072
+ "support": 95
2073
+ },
2074
+ "high": {
2075
+ "precision": 0.5385,
2076
+ "recall": 0.3111,
2077
+ "f1": 0.3944,
2078
+ "support": 45
2079
+ }
2080
+ },
2081
+ "confusion_matrix": [
2082
+ [
2083
+ 104,
2084
+ 28,
2085
+ 4
2086
+ ],
2087
+ [
2088
+ 27,
2089
+ 60,
2090
+ 8
2091
+ ],
2092
+ [
2093
+ 7,
2094
+ 24,
2095
+ 14
2096
+ ]
2097
+ ]
2098
+ }
2099
+ },
2100
+ "overall": {
2101
+ "exact_match": 0.3406,
2102
+ "macro_average_f1": 0.6838,
2103
+ "automation_safe_accuracy": 0.8,
2104
+ "automation_safe_coverage": 0.0543,
2105
+ "confidence_threshold": 0.8,
2106
+ "confidence_calibration": {
2107
+ "ece": 0.347199,
2108
+ "bins": [
2109
+ {
2110
+ "range": [
2111
+ 0.5,
2112
+ 0.6
2113
+ ],
2114
+ "count": 23,
2115
+ "avg_confidence": 0.572,
2116
+ "accuracy": 0.2174
2117
+ },
2118
+ {
2119
+ "range": [
2120
+ 0.6,
2121
+ 0.7
2122
+ ],
2123
+ "count": 134,
2124
+ "avg_confidence": 0.6518,
2125
+ "accuracy": 0.2239
2126
+ },
2127
+ {
2128
+ "range": [
2129
+ 0.7,
2130
+ 0.8
2131
+ ],
2132
+ "count": 104,
2133
+ "avg_confidence": 0.7415,
2134
+ "accuracy": 0.4519
2135
+ },
2136
+ {
2137
+ "range": [
2138
+ 0.8,
2139
+ 0.9
2140
+ ],
2141
+ "count": 15,
2142
+ "avg_confidence": 0.8136,
2143
+ "accuracy": 0.8
2144
+ }
2145
+ ]
2146
+ }
2147
+ },
2148
+ "training": {
2149
+ "epoch": 8,
2150
+ "loss": 2.3349
2151
+ }
2152
+ },
2153
+ {
2154
+ "per_head": {
2155
+ "relation_to_previous": {
2156
+ "accuracy": 0.8841,
2157
+ "macro_f1": 0.8031,
2158
+ "per_label": {
2159
+ "new": {
2160
+ "precision": 1.0,
2161
+ "recall": 1.0,
2162
+ "f1": 1.0,
2163
+ "support": 102
2164
+ },
2165
+ "follow_up": {
2166
+ "precision": 0.8764,
2167
+ "recall": 0.9176,
2168
+ "f1": 0.8966,
2169
+ "support": 85
2170
+ },
2171
+ "correction": {
2172
+ "precision": 0.7,
2173
+ "recall": 0.7241,
2174
+ "f1": 0.7119,
2175
+ "support": 29
2176
+ },
2177
+ "confirmation": {
2178
+ "precision": 0.875,
2179
+ "recall": 0.7,
2180
+ "f1": 0.7778,
2181
+ "support": 20
2182
+ },
2183
+ "cancellation": {
2184
+ "precision": 0.7692,
2185
+ "recall": 0.4762,
2186
+ "f1": 0.5882,
2187
+ "support": 21
2188
+ },
2189
+ "closure": {
2190
+ "precision": 0.7308,
2191
+ "recall": 1.0,
2192
+ "f1": 0.8444,
2193
+ "support": 19
2194
+ }
2195
+ },
2196
+ "confusion_matrix": [
2197
+ [
2198
+ 102,
2199
+ 0,
2200
+ 0,
2201
+ 0,
2202
+ 0,
2203
+ 0
2204
+ ],
2205
+ [
2206
+ 0,
2207
+ 78,
2208
+ 3,
2209
+ 0,
2210
+ 2,
2211
+ 2
2212
+ ],
2213
+ [
2214
+ 0,
2215
+ 5,
2216
+ 21,
2217
+ 1,
2218
+ 1,
2219
+ 1
2220
+ ],
2221
+ [
2222
+ 0,
2223
+ 3,
2224
+ 0,
2225
+ 14,
2226
+ 0,
2227
+ 3
2228
+ ],
2229
+ [
2230
+ 0,
2231
+ 3,
2232
+ 6,
2233
+ 1,
2234
+ 10,
2235
+ 1
2236
+ ],
2237
+ [
2238
+ 0,
2239
+ 0,
2240
+ 0,
2241
+ 0,
2242
+ 0,
2243
+ 19
2244
+ ]
2245
+ ]
2246
+ },
2247
+ "actionability": {
2248
+ "accuracy": 0.6993,
2249
+ "macro_f1": 0.6742,
2250
+ "per_label": {
2251
+ "none": {
2252
+ "precision": 0.629,
2253
+ "recall": 0.6393,
2254
+ "f1": 0.6341,
2255
+ "support": 61
2256
+ },
2257
+ "review": {
2258
+ "precision": 0.6267,
2259
+ "recall": 0.5949,
2260
+ "f1": 0.6104,
2261
+ "support": 79
2262
+ },
2263
+ "act": {
2264
+ "precision": 0.7698,
2265
+ "recall": 0.7868,
2266
+ "f1": 0.7782,
2267
+ "support": 136
2268
+ }
2269
+ },
2270
+ "confusion_matrix": [
2271
+ [
2272
+ 39,
2273
+ 14,
2274
+ 8
2275
+ ],
2276
+ [
2277
+ 8,
2278
+ 47,
2279
+ 24
2280
+ ],
2281
+ [
2282
+ 15,
2283
+ 14,
2284
+ 107
2285
+ ]
2286
+ ]
2287
+ },
2288
+ "retention": {
2289
+ "accuracy": 0.6739,
2290
+ "macro_f1": 0.6684,
2291
+ "per_label": {
2292
+ "ephemeral": {
2293
+ "precision": 0.5842,
2294
+ "recall": 0.6629,
2295
+ "f1": 0.6211,
2296
+ "support": 89
2297
+ },
2298
+ "useful": {
2299
+ "precision": 0.7042,
2300
+ "recall": 0.7143,
2301
+ "f1": 0.7092,
2302
+ "support": 140
2303
+ },
2304
+ "remember": {
2305
+ "precision": 0.8182,
2306
+ "recall": 0.5745,
2307
+ "f1": 0.675,
2308
+ "support": 47
2309
+ }
2310
+ },
2311
+ "confusion_matrix": [
2312
+ [
2313
+ 59,
2314
+ 29,
2315
+ 1
2316
+ ],
2317
+ [
2318
+ 35,
2319
+ 100,
2320
+ 5
2321
+ ],
2322
+ [
2323
+ 7,
2324
+ 13,
2325
+ 27
2326
+ ]
2327
+ ]
2328
+ },
2329
+ "urgency": {
2330
+ "accuracy": 0.6413,
2331
+ "macro_f1": 0.5729,
2332
+ "per_label": {
2333
+ "low": {
2334
+ "precision": 0.7413,
2335
+ "recall": 0.7794,
2336
+ "f1": 0.7599,
2337
+ "support": 136
2338
+ },
2339
+ "medium": {
2340
+ "precision": 0.5327,
2341
+ "recall": 0.6,
2342
+ "f1": 0.5644,
2343
+ "support": 95
2344
+ },
2345
+ "high": {
2346
+ "precision": 0.5385,
2347
+ "recall": 0.3111,
2348
+ "f1": 0.3944,
2349
+ "support": 45
2350
+ }
2351
+ },
2352
+ "confusion_matrix": [
2353
+ [
2354
+ 106,
2355
+ 26,
2356
+ 4
2357
+ ],
2358
+ [
2359
+ 30,
2360
+ 57,
2361
+ 8
2362
+ ],
2363
+ [
2364
+ 7,
2365
+ 24,
2366
+ 14
2367
+ ]
2368
+ ]
2369
+ }
2370
+ },
2371
+ "overall": {
2372
+ "exact_match": 0.3188,
2373
+ "macro_average_f1": 0.6797,
2374
+ "automation_safe_accuracy": 0.6818,
2375
+ "automation_safe_coverage": 0.0797,
2376
+ "confidence_threshold": 0.8,
2377
+ "confidence_calibration": {
2378
+ "ece": 0.370942,
2379
+ "bins": [
2380
+ {
2381
+ "range": [
2382
+ 0.5,
2383
+ 0.6
2384
+ ],
2385
+ "count": 28,
2386
+ "avg_confidence": 0.5705,
2387
+ "accuracy": 0.1071
2388
+ },
2389
+ {
2390
+ "range": [
2391
+ 0.6,
2392
+ 0.7
2393
+ ],
2394
+ "count": 129,
2395
+ "avg_confidence": 0.6544,
2396
+ "accuracy": 0.2713
2397
+ },
2398
+ {
2399
+ "range": [
2400
+ 0.7,
2401
+ 0.8
2402
+ ],
2403
+ "count": 97,
2404
+ "avg_confidence": 0.7429,
2405
+ "accuracy": 0.3608
2406
+ },
2407
+ {
2408
+ "range": [
2409
+ 0.8,
2410
+ 0.9
2411
+ ],
2412
+ "count": 22,
2413
+ "avg_confidence": 0.815,
2414
+ "accuracy": 0.6818
2415
+ }
2416
+ ]
2417
+ }
2418
+ },
2419
+ "training": {
2420
+ "epoch": 9,
2421
+ "loss": 2.2636
2422
+ }
2423
+ },
2424
+ {
2425
+ "per_head": {
2426
+ "relation_to_previous": {
2427
+ "accuracy": 0.8804,
2428
+ "macro_f1": 0.7966,
2429
+ "per_label": {
2430
+ "new": {
2431
+ "precision": 1.0,
2432
+ "recall": 1.0,
2433
+ "f1": 1.0,
2434
+ "support": 102
2435
+ },
2436
+ "follow_up": {
2437
+ "precision": 0.8764,
2438
+ "recall": 0.9176,
2439
+ "f1": 0.8966,
2440
+ "support": 85
2441
+ },
2442
+ "correction": {
2443
+ "precision": 0.6897,
2444
+ "recall": 0.6897,
2445
+ "f1": 0.6897,
2446
+ "support": 29
2447
+ },
2448
+ "confirmation": {
2449
+ "precision": 0.875,
2450
+ "recall": 0.7,
2451
+ "f1": 0.7778,
2452
+ "support": 20
2453
+ },
2454
+ "cancellation": {
2455
+ "precision": 0.7143,
2456
+ "recall": 0.4762,
2457
+ "f1": 0.5714,
2458
+ "support": 21
2459
+ },
2460
+ "closure": {
2461
+ "precision": 0.7308,
2462
+ "recall": 1.0,
2463
+ "f1": 0.8444,
2464
+ "support": 19
2465
+ }
2466
+ },
2467
+ "confusion_matrix": [
2468
+ [
2469
+ 102,
2470
+ 0,
2471
+ 0,
2472
+ 0,
2473
+ 0,
2474
+ 0
2475
+ ],
2476
+ [
2477
+ 0,
2478
+ 78,
2479
+ 3,
2480
+ 0,
2481
+ 2,
2482
+ 2
2483
+ ],
2484
+ [
2485
+ 0,
2486
+ 5,
2487
+ 20,
2488
+ 1,
2489
+ 2,
2490
+ 1
2491
+ ],
2492
+ [
2493
+ 0,
2494
+ 3,
2495
+ 0,
2496
+ 14,
2497
+ 0,
2498
+ 3
2499
+ ],
2500
+ [
2501
+ 0,
2502
+ 3,
2503
+ 6,
2504
+ 1,
2505
+ 10,
2506
+ 1
2507
+ ],
2508
+ [
2509
+ 0,
2510
+ 0,
2511
+ 0,
2512
+ 0,
2513
+ 0,
2514
+ 19
2515
+ ]
2516
+ ]
2517
+ },
2518
+ "actionability": {
2519
+ "accuracy": 0.7174,
2520
+ "macro_f1": 0.697,
2521
+ "per_label": {
2522
+ "none": {
2523
+ "precision": 0.6557,
2524
+ "recall": 0.6557,
2525
+ "f1": 0.6557,
2526
+ "support": 61
2527
+ },
2528
+ "review": {
2529
+ "precision": 0.642,
2530
+ "recall": 0.6582,
2531
+ "f1": 0.65,
2532
+ "support": 79
2533
+ },
2534
+ "act": {
2535
+ "precision": 0.791,
2536
+ "recall": 0.7794,
2537
+ "f1": 0.7852,
2538
+ "support": 136
2539
+ }
2540
+ },
2541
+ "confusion_matrix": [
2542
+ [
2543
+ 40,
2544
+ 14,
2545
+ 7
2546
+ ],
2547
+ [
2548
+ 6,
2549
+ 52,
2550
+ 21
2551
+ ],
2552
+ [
2553
+ 15,
2554
+ 15,
2555
+ 106
2556
+ ]
2557
+ ]
2558
+ },
2559
+ "retention": {
2560
+ "accuracy": 0.6848,
2561
+ "macro_f1": 0.6687,
2562
+ "per_label": {
2563
+ "ephemeral": {
2564
+ "precision": 0.6222,
2565
+ "recall": 0.6292,
2566
+ "f1": 0.6257,
2567
+ "support": 89
2568
+ },
2569
+ "useful": {
2570
+ "precision": 0.6993,
2571
+ "recall": 0.7643,
2572
+ "f1": 0.7304,
2573
+ "support": 140
2574
+ },
2575
+ "remember": {
2576
+ "precision": 0.7879,
2577
+ "recall": 0.5532,
2578
+ "f1": 0.65,
2579
+ "support": 47
2580
+ }
2581
+ },
2582
+ "confusion_matrix": [
2583
+ [
2584
+ 56,
2585
+ 32,
2586
+ 1
2587
+ ],
2588
+ [
2589
+ 27,
2590
+ 107,
2591
+ 6
2592
+ ],
2593
+ [
2594
+ 7,
2595
+ 14,
2596
+ 26
2597
+ ]
2598
+ ]
2599
+ },
2600
+ "urgency": {
2601
+ "accuracy": 0.6304,
2602
+ "macro_f1": 0.5648,
2603
+ "per_label": {
2604
+ "low": {
2605
+ "precision": 0.7324,
2606
+ "recall": 0.7647,
2607
+ "f1": 0.7482,
2608
+ "support": 136
2609
+ },
2610
+ "medium": {
2611
+ "precision": 0.5185,
2612
+ "recall": 0.5895,
2613
+ "f1": 0.5517,
2614
+ "support": 95
2615
+ },
2616
+ "high": {
2617
+ "precision": 0.5385,
2618
+ "recall": 0.3111,
2619
+ "f1": 0.3944,
2620
+ "support": 45
2621
+ }
2622
+ },
2623
+ "confusion_matrix": [
2624
+ [
2625
+ 104,
2626
+ 28,
2627
+ 4
2628
+ ],
2629
+ [
2630
+ 31,
2631
+ 56,
2632
+ 8
2633
+ ],
2634
+ [
2635
+ 7,
2636
+ 24,
2637
+ 14
2638
+ ]
2639
+ ]
2640
+ }
2641
+ },
2642
+ "overall": {
2643
+ "exact_match": 0.337,
2644
+ "macro_average_f1": 0.6818,
2645
+ "automation_safe_accuracy": 0.7,
2646
+ "automation_safe_coverage": 0.0725,
2647
+ "confidence_threshold": 0.8,
2648
+ "confidence_calibration": {
2649
+ "ece": 0.357574,
2650
+ "bins": [
2651
+ {
2652
+ "range": [
2653
+ 0.5,
2654
+ 0.6
2655
+ ],
2656
+ "count": 22,
2657
+ "avg_confidence": 0.5729,
2658
+ "accuracy": 0.1364
2659
+ },
2660
+ {
2661
+ "range": [
2662
+ 0.6,
2663
+ 0.7
2664
+ ],
2665
+ "count": 128,
2666
+ "avg_confidence": 0.6553,
2667
+ "accuracy": 0.2344
2668
+ },
2669
+ {
2670
+ "range": [
2671
+ 0.7,
2672
+ 0.8
2673
+ ],
2674
+ "count": 106,
2675
+ "avg_confidence": 0.7441,
2676
+ "accuracy": 0.434
2677
+ },
2678
+ {
2679
+ "range": [
2680
+ 0.8,
2681
+ 0.9
2682
+ ],
2683
+ "count": 20,
2684
+ "avg_confidence": 0.817,
2685
+ "accuracy": 0.7
2686
+ }
2687
+ ]
2688
+ }
2689
+ },
2690
+ "training": {
2691
+ "epoch": 10,
2692
+ "loss": 2.2342
2693
+ }
2694
+ }
2695
+ ],
2696
+ "best_macro_average_f1": 0.6838
2697
+ }
metrics.json ADDED
@@ -0,0 +1,285 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "per_head": {
3
+ "relation_to_previous": {
4
+ "accuracy": 0.8768,
5
+ "macro_f1": 0.7893,
6
+ "per_label": {
7
+ "new": {
8
+ "precision": 1.0,
9
+ "recall": 1.0,
10
+ "f1": 1.0,
11
+ "support": 102
12
+ },
13
+ "follow_up": {
14
+ "precision": 0.8764,
15
+ "recall": 0.9176,
16
+ "f1": 0.8966,
17
+ "support": 85
18
+ },
19
+ "correction": {
20
+ "precision": 0.6786,
21
+ "recall": 0.6552,
22
+ "f1": 0.6667,
23
+ "support": 29
24
+ },
25
+ "confirmation": {
26
+ "precision": 0.8235,
27
+ "recall": 0.7,
28
+ "f1": 0.7568,
29
+ "support": 20
30
+ },
31
+ "cancellation": {
32
+ "precision": 0.7143,
33
+ "recall": 0.4762,
34
+ "f1": 0.5714,
35
+ "support": 21
36
+ },
37
+ "closure": {
38
+ "precision": 0.7308,
39
+ "recall": 1.0,
40
+ "f1": 0.8444,
41
+ "support": 19
42
+ }
43
+ },
44
+ "confusion_matrix": [
45
+ [
46
+ 102,
47
+ 0,
48
+ 0,
49
+ 0,
50
+ 0,
51
+ 0
52
+ ],
53
+ [
54
+ 0,
55
+ 78,
56
+ 3,
57
+ 0,
58
+ 2,
59
+ 2
60
+ ],
61
+ [
62
+ 0,
63
+ 5,
64
+ 19,
65
+ 2,
66
+ 2,
67
+ 1
68
+ ],
69
+ [
70
+ 0,
71
+ 3,
72
+ 0,
73
+ 14,
74
+ 0,
75
+ 3
76
+ ],
77
+ [
78
+ 0,
79
+ 3,
80
+ 6,
81
+ 1,
82
+ 10,
83
+ 1
84
+ ],
85
+ [
86
+ 0,
87
+ 0,
88
+ 0,
89
+ 0,
90
+ 0,
91
+ 19
92
+ ]
93
+ ]
94
+ },
95
+ "actionability": {
96
+ "accuracy": 0.7101,
97
+ "macro_f1": 0.6834,
98
+ "per_label": {
99
+ "none": {
100
+ "precision": 0.6786,
101
+ "recall": 0.623,
102
+ "f1": 0.6496,
103
+ "support": 61
104
+ },
105
+ "review": {
106
+ "precision": 0.6571,
107
+ "recall": 0.5823,
108
+ "f1": 0.6174,
109
+ "support": 79
110
+ },
111
+ "act": {
112
+ "precision": 0.7467,
113
+ "recall": 0.8235,
114
+ "f1": 0.7832,
115
+ "support": 136
116
+ }
117
+ },
118
+ "confusion_matrix": [
119
+ [
120
+ 38,
121
+ 13,
122
+ 10
123
+ ],
124
+ [
125
+ 5,
126
+ 46,
127
+ 28
128
+ ],
129
+ [
130
+ 13,
131
+ 11,
132
+ 112
133
+ ]
134
+ ]
135
+ },
136
+ "retention": {
137
+ "accuracy": 0.7029,
138
+ "macro_f1": 0.6849,
139
+ "per_label": {
140
+ "ephemeral": {
141
+ "precision": 0.6628,
142
+ "recall": 0.6404,
143
+ "f1": 0.6514,
144
+ "support": 89
145
+ },
146
+ "useful": {
147
+ "precision": 0.7025,
148
+ "recall": 0.7929,
149
+ "f1": 0.745,
150
+ "support": 140
151
+ },
152
+ "remember": {
153
+ "precision": 0.8125,
154
+ "recall": 0.5532,
155
+ "f1": 0.6582,
156
+ "support": 47
157
+ }
158
+ },
159
+ "confusion_matrix": [
160
+ [
161
+ 57,
162
+ 31,
163
+ 1
164
+ ],
165
+ [
166
+ 24,
167
+ 111,
168
+ 5
169
+ ],
170
+ [
171
+ 5,
172
+ 16,
173
+ 26
174
+ ]
175
+ ]
176
+ },
177
+ "urgency": {
178
+ "accuracy": 0.6449,
179
+ "macro_f1": 0.5777,
180
+ "per_label": {
181
+ "low": {
182
+ "precision": 0.7536,
183
+ "recall": 0.7647,
184
+ "f1": 0.7591,
185
+ "support": 136
186
+ },
187
+ "medium": {
188
+ "precision": 0.5357,
189
+ "recall": 0.6316,
190
+ "f1": 0.5797,
191
+ "support": 95
192
+ },
193
+ "high": {
194
+ "precision": 0.5385,
195
+ "recall": 0.3111,
196
+ "f1": 0.3944,
197
+ "support": 45
198
+ }
199
+ },
200
+ "confusion_matrix": [
201
+ [
202
+ 104,
203
+ 28,
204
+ 4
205
+ ],
206
+ [
207
+ 27,
208
+ 60,
209
+ 8
210
+ ],
211
+ [
212
+ 7,
213
+ 24,
214
+ 14
215
+ ]
216
+ ]
217
+ }
218
+ },
219
+ "overall": {
220
+ "exact_match": 0.3406,
221
+ "macro_average_f1": 0.6838,
222
+ "automation_safe_accuracy": 0.6222,
223
+ "automation_safe_coverage": 0.163,
224
+ "confidence_threshold": 0.8,
225
+ "confidence_calibration": {
226
+ "ece": 0.395419,
227
+ "bins": [
228
+ {
229
+ "range": [
230
+ 0.5,
231
+ 0.6
232
+ ],
233
+ "count": 6,
234
+ "avg_confidence": 0.5757,
235
+ "accuracy": 0.3333
236
+ },
237
+ {
238
+ "range": [
239
+ 0.6,
240
+ 0.7
241
+ ],
242
+ "count": 86,
243
+ "avg_confidence": 0.6646,
244
+ "accuracy": 0.2326
245
+ },
246
+ {
247
+ "range": [
248
+ 0.7,
249
+ 0.8
250
+ ],
251
+ "count": 139,
252
+ "avg_confidence": 0.7484,
253
+ "accuracy": 0.3165
254
+ },
255
+ {
256
+ "range": [
257
+ 0.8,
258
+ 0.9
259
+ ],
260
+ "count": 42,
261
+ "avg_confidence": 0.8384,
262
+ "accuracy": 0.5952
263
+ },
264
+ {
265
+ "range": [
266
+ 0.9,
267
+ 1.0
268
+ ],
269
+ "count": 3,
270
+ "avg_confidence": 0.9038,
271
+ "accuracy": 1.0
272
+ }
273
+ ]
274
+ }
275
+ },
276
+ "temperature_scaling": {
277
+ "method": "per_head_temperature_scaling",
278
+ "per_head": {
279
+ "relation_to_previous": 0.630957,
280
+ "actionability": 0.891251,
281
+ "retention": 0.944061,
282
+ "urgency": 0.891251
283
+ }
284
+ }
285
+ }
model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20478cbe0d1d00b60890e6dfff5e313e8c51f7fa81572b2e8528f1d603364f1e
3
+ size 133553471
model_config.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "encoder_name": "microsoft/MiniLM-L12-H384-uncased",
3
+ "dropout": 0.1,
4
+ "action_vocab": [
5
+ "none",
6
+ "create",
7
+ "update",
8
+ "send",
9
+ "store",
10
+ "route",
11
+ "schedule",
12
+ "dismissed",
13
+ "clarify",
14
+ "search",
15
+ "notify",
16
+ "cancel",
17
+ "complete",
18
+ "other"
19
+ ],
20
+ "outcome_vocab": [
21
+ "success",
22
+ "pending",
23
+ "failed",
24
+ "cancelled",
25
+ "unknown"
26
+ ],
27
+ "label_maps": {
28
+ "relation_to_previous": [
29
+ "new",
30
+ "follow_up",
31
+ "correction",
32
+ "confirmation",
33
+ "cancellation",
34
+ "closure"
35
+ ],
36
+ "actionability": [
37
+ "none",
38
+ "review",
39
+ "act"
40
+ ],
41
+ "retention": [
42
+ "ephemeral",
43
+ "useful",
44
+ "remember"
45
+ ],
46
+ "urgency": [
47
+ "low",
48
+ "medium",
49
+ "high"
50
+ ]
51
+ },
52
+ "structured_hidden_dim": 32,
53
+ "recency_embed_dim": 8,
54
+ "pooling_type": "attention",
55
+ "use_head_dependencies": true,
56
+ "dependency_hidden_dim": 32,
57
+ "feature_mode": "full_interaction",
58
+ "max_length": 128,
59
+ "recency_max": 3600
60
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
temperature_scaling.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "method": "per_head_temperature_scaling",
3
+ "source_split": "validation",
4
+ "per_head": {
5
+ "relation_to_previous": 0.630957,
6
+ "actionability": 0.891251,
7
+ "retention": 0.944061,
8
+ "urgency": 0.891251
9
+ }
10
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
training_args.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "train_file": "/home/user/app/data/synthetic/train.jsonl",
3
+ "validation_file": "/home/user/app/data/synthetic/validation.jsonl",
4
+ "test_file": null,
5
+ "output_dir": "/tmp/tiny-router-r6z0atnx/checkpoint",
6
+ "encoder_name": "microsoft/MiniLM-L12-H384-uncased",
7
+ "device": "cuda",
8
+ "feature_mode": "full_interaction",
9
+ "pooling_type": "attention",
10
+ "use_head_dependencies": true,
11
+ "dependency_hidden_dim": 32,
12
+ "max_length": 128,
13
+ "recency_max": 3600,
14
+ "batch_size": 32,
15
+ "epochs": 10,
16
+ "encoder_lr": 2e-05,
17
+ "head_lr": 2e-05,
18
+ "weight_decay": 0.01,
19
+ "warmup_ratio": 0.1,
20
+ "dropout": 0.1,
21
+ "seed": 13,
22
+ "patience": 2,
23
+ "mixed_precision": false,
24
+ "confidence_threshold": 0.8,
25
+ "head_loss_weights": "{}"
26
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff