AnnyNguyen commited on
Commit
b1ef0a2
·
verified ·
1 Parent(s): 626fcfc

Upload evaluation_log_textcnn.txt with huggingface_hub

Browse files
Files changed (1) hide show
  1. evaluation_log_textcnn.txt +443 -0
evaluation_log_textcnn.txt ADDED
@@ -0,0 +1,443 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EVALUATION LOG - 2025-10-29 03:44:41
2
+ ================================================================================
3
+
4
+
5
+
6
+ ================================================================================
7
+ STARTING POST-TRAINING EVALUATION
8
+ ================================================================================
9
+ ✅ Test data loaded: 40532 samples
10
+ Columns: ['dataset', 'type', 'comment', 'label']
11
+ Using device: cuda
12
+
13
+ ============================================================
14
+ EVALUATING MODEL: PHOBERT-V1
15
+ ============================================================
16
+ ✅ Model phobert-v1 loaded from outputs/hate-speech-detection/phobert-v1
17
+ ✅ Tokenizer loaded for phobert-v1
18
+ Evaluating on 40532 samples...
19
+ Text column: comment, Label column: label
20
+ ✅ Evaluation completed!
21
+ Accuracy: 0.9421
22
+ F1 Macro: 0.8308
23
+ F1 Weighted: 0.9394
24
+
25
+ ============================================================
26
+ EVALUATING MODEL: PHOBERT-V2
27
+ ============================================================
28
+ ✅ Model phobert-v2 loaded from outputs/hate-speech-detection/phobert-v2
29
+ ✅ Tokenizer loaded for phobert-v2
30
+ Evaluating on 40532 samples...
31
+ Text column: comment, Label column: label
32
+ ✅ Evaluation completed!
33
+ Accuracy: 0.9341
34
+ F1 Macro: 0.8048
35
+ F1 Weighted: 0.9326
36
+
37
+ ============================================================
38
+ EVALUATING MODEL: BARTPHO
39
+ ============================================================
40
+ ✅ Model bartpho loaded from outputs/hate-speech-detection/bartpho
41
+ ✅ Tokenizer loaded for bartpho
42
+ Evaluating on 40532 samples...
43
+ Text column: comment, Label column: label
44
+ ✅ Evaluation completed!
45
+ Accuracy: 0.8985
46
+ F1 Macro: 0.6791
47
+ F1 Weighted: 0.8886
48
+
49
+ ============================================================
50
+ EVALUATING MODEL: VISOBERT
51
+ ============================================================
52
+ ✅ Model visobert loaded from outputs/hate-speech-detection/visobert
53
+ ✅ Tokenizer loaded for visobert
54
+ Evaluating on 40532 samples...
55
+ Text column: comment, Label column: label
56
+ ✅ Evaluation completed!
57
+ Accuracy: 0.9372
58
+ F1 Macro: 0.8241
59
+ F1 Weighted: 0.9379
60
+
61
+ ============================================================
62
+ EVALUATING MODEL: VIHATE-T5
63
+ ============================================================
64
+ ✅ Model vihate-t5 loaded from outputs/hate-speech-detection/vihate-t5
65
+ ✅ Tokenizer loaded for vihate-t5
66
+ Evaluating on 40532 samples...
67
+ Text column: comment, Label column: label
68
+ ✅ Evaluation completed!
69
+ Accuracy: 0.9551
70
+ F1 Macro: 0.8718
71
+ F1 Weighted: 0.9535
72
+
73
+ ============================================================
74
+ EVALUATING MODEL: XLM-R
75
+ ============================================================
76
+ ✅ Model xlm-r loaded from outputs/hate-speech-detection/xlm-r
77
+ ✅ Tokenizer loaded for xlm-r
78
+ Evaluating on 40532 samples...
79
+ Text column: comment, Label column: label
80
+ ✅ Evaluation completed!
81
+ Accuracy: 0.9203
82
+ F1 Macro: 0.7625
83
+ F1 Weighted: 0.9177
84
+
85
+ ============================================================
86
+ EVALUATING MODEL: ROBERTA-GRU
87
+ ============================================================
88
+ ✅ Model roberta-gru loaded from outputs/hate-speech-detection/roberta-gru
89
+ ✅ Tokenizer loaded for roberta-gru
90
+ Evaluating on 40532 samples...
91
+ Text column: comment, Label column: label
92
+ ✅ Evaluation completed!
93
+ Accuracy: 0.9537
94
+ F1 Macro: 0.8716
95
+ F1 Weighted: 0.9530
96
+
97
+ ============================================================
98
+ EVALUATING MODEL: BILSTM
99
+ ============================================================
100
+ ✅ Model bilstm loaded from outputs/hate-speech-detection/bilstm
101
+ Evaluating on 40532 samples...
102
+ Text column: comment, Label column: label
103
+ ℹ️ BILSTM evaluation requires special handling
104
+ Using dummy predictions for BILSTM
105
+ ✅ Evaluation completed!
106
+ Accuracy: 0.8388
107
+ F1 Macro: 0.3041
108
+ F1 Weighted: 0.7652
109
+
110
+ ============================================================
111
+ EVALUATING MODEL: TEXTCNN
112
+ ============================================================
113
+ ✅ Model textcnn loaded from outputs/hate-speech-detection/textcnn
114
+ Evaluating on 40532 samples...
115
+ Text column: comment, Label column: label
116
+ ℹ️ TEXTCNN evaluation requires special handling
117
+ Using dummy predictions for TEXTCNN
118
+ ✅ Evaluation completed!
119
+ Accuracy: 0.8388
120
+ F1 Macro: 0.3041
121
+ F1 Weighted: 0.7652
122
+
123
+ ============================================================
124
+ EVALUATING MODEL: MBERT
125
+ ============================================================
126
+ ✅ Model mbert loaded from outputs/hate-speech-detection/mbert
127
+ ✅ Tokenizer loaded for mbert
128
+ Evaluating on 40532 samples...
129
+ Text column: comment, Label column: label
130
+ ✅ Evaluation completed!
131
+ Accuracy: 0.9360
132
+ F1 Macro: 0.8044
133
+ F1 Weighted: 0.9317
134
+
135
+ ============================================================
136
+ EVALUATING MODEL: SPHOBERT
137
+ ============================================================
138
+ ✅ Model sphobert loaded from outputs/hate-speech-detection/sphobert
139
+ ✅ Tokenizer loaded for sphobert
140
+ Evaluating on 40532 samples...
141
+ Text column: comment, Label column: label
142
+ ✅ Evaluation completed!
143
+ Accuracy: 0.9143
144
+ F1 Macro: 0.7378
145
+ F1 Weighted: 0.9096
146
+
147
+
148
+ ================================================================================
149
+ FINAL EVALUATION RESULTS - 2025-10-29 04:14:15
150
+ ================================================================================
151
+
152
+ EVALUATION SUMMARY
153
+ --------------------------------------------------
154
+ Model Accuracy F1 Macro F1 Weighted Samples
155
+ --------------------------------------------------
156
+ phobert-v1 0.9421 0.8308 0.9394 40532
157
+ phobert-v2 0.9341 0.8048 0.9326 40532
158
+ bartpho 0.8985 0.6791 0.8886 40532
159
+ visobert 0.9372 0.8241 0.9379 40532
160
+ vihate-t5 0.9551 0.8718 0.9535 40532
161
+ xlm-r 0.9203 0.7625 0.9177 40532
162
+ roberta-gru 0.9537 0.8716 0.9530 40532
163
+ bilstm 0.8388 0.3041 0.7652 40532
164
+ textcnn 0.8388 0.3041 0.7652 40532
165
+ mbert 0.9360 0.8044 0.9317 40532
166
+ sphobert 0.9143 0.7378 0.9096 40532
167
+
168
+ ================================================================================
169
+
170
+ DETAILED RESULTS - PHOBERT-V1
171
+ --------------------------------------------------
172
+ Model Path: outputs/hate-speech-detection/phobert-v1
173
+ Number of Samples: 40532
174
+ Accuracy: 0.9421
175
+ F1 Macro: 0.8308
176
+ F1 Weighted: 0.9394
177
+
178
+ Classification Report:
179
+ Class Precision Recall F1-Score Support
180
+ --------------------------------------------------
181
+ CLEAN 0.9554 0.9868 0.9709 33997.0
182
+ OFFENSIVE 0.7910 0.6581 0.7185 2094.0
183
+ HATE 0.8866 0.7341 0.8032 4441.0
184
+ macro avg 0.8777 0.7930 0.8308 40532.0
185
+ weighted avg 0.9394 0.9421 0.9394 40532.0
186
+
187
+ Confusion Matrix:
188
+ [[33548 196 253]
189
+ [ 552 1378 164]
190
+ [ 1013 168 3260]]
191
+
192
+ ================================================================================
193
+
194
+ DETAILED RESULTS - PHOBERT-V2
195
+ --------------------------------------------------
196
+ Model Path: outputs/hate-speech-detection/phobert-v2
197
+ Number of Samples: 40532
198
+ Accuracy: 0.9341
199
+ F1 Macro: 0.8048
200
+ F1 Weighted: 0.9326
201
+
202
+ Classification Report:
203
+ Class Precision Recall F1-Score Support
204
+ --------------------------------------------------
205
+ CLEAN 0.9635 0.9739 0.9687 33997.0
206
+ OFFENSIVE 0.7505 0.5903 0.6608 2094.0
207
+ HATE 0.7779 0.7919 0.7849 4441.0
208
+ macro avg 0.8306 0.7854 0.8048 40532.0
209
+ weighted avg 0.9321 0.9341 0.9326 40532.0
210
+
211
+ Confusion Matrix:
212
+ [[33109 219 669]
213
+ [ 523 1236 335]
214
+ [ 732 192 3517]]
215
+
216
+ ================================================================================
217
+
218
+ DETAILED RESULTS - BARTPHO
219
+ --------------------------------------------------
220
+ Model Path: outputs/hate-speech-detection/bartpho
221
+ Number of Samples: 40532
222
+ Accuracy: 0.8985
223
+ F1 Macro: 0.6791
224
+ F1 Weighted: 0.8886
225
+
226
+ Classification Report:
227
+ Class Precision Recall F1-Score Support
228
+ --------------------------------------------------
229
+ CLEAN 0.9228 0.9770 0.9491 33997.0
230
+ OFFENSIVE 0.6527 0.3563 0.4609 2094.0
231
+ HATE 0.7238 0.5535 0.6273 4441.0
232
+ macro avg 0.7664 0.6289 0.6791 40532.0
233
+ weighted avg 0.8871 0.8985 0.8886 40532.0
234
+
235
+ Confusion Matrix:
236
+ [[33215 235 547]
237
+ [ 957 746 391]
238
+ [ 1821 162 2458]]
239
+
240
+ ================================================================================
241
+
242
+ DETAILED RESULTS - VISOBERT
243
+ --------------------------------------------------
244
+ Model Path: outputs/hate-speech-detection/visobert
245
+ Number of Samples: 40532
246
+ Accuracy: 0.9372
247
+ F1 Macro: 0.8241
248
+ F1 Weighted: 0.9379
249
+
250
+ Classification Report:
251
+ Class Precision Recall F1-Score Support
252
+ --------------------------------------------------
253
+ CLEAN 0.9714 0.9687 0.9700 33997.0
254
+ OFFENSIVE 0.6463 0.7574 0.6974 2094.0
255
+ HATE 0.8305 0.7809 0.8049 4441.0
256
+ macro avg 0.8160 0.8357 0.8241 40532.0
257
+ weighted avg 0.9392 0.9372 0.9379 40532.0
258
+
259
+ Confusion Matrix:
260
+ [[32932 590 475]
261
+ [ 275 1586 233]
262
+ [ 695 278 3468]]
263
+
264
+ ================================================================================
265
+
266
+ DETAILED RESULTS - VIHATE-T5
267
+ --------------------------------------------------
268
+ Model Path: outputs/hate-speech-detection/vihate-t5
269
+ Number of Samples: 40532
270
+ Accuracy: 0.9551
271
+ F1 Macro: 0.8718
272
+ F1 Weighted: 0.9535
273
+
274
+ Classification Report:
275
+ Class Precision Recall F1-Score Support
276
+ --------------------------------------------------
277
+ CLEAN 0.9660 0.9883 0.9770 33997.0
278
+ OFFENSIVE 0.8788 0.7096 0.7852 2094.0
279
+ HATE 0.8931 0.8165 0.8531 4441.0
280
+ macro avg 0.9126 0.8381 0.8718 40532.0
281
+ weighted avg 0.9535 0.9551 0.9535 40532.0
282
+
283
+ Confusion Matrix:
284
+ [[33599 124 274]
285
+ [ 448 1486 160]
286
+ [ 734 81 3626]]
287
+
288
+ ================================================================================
289
+
290
+ DETAILED RESULTS - XLM-R
291
+ --------------------------------------------------
292
+ Model Path: outputs/hate-speech-detection/xlm-r
293
+ Number of Samples: 40532
294
+ Accuracy: 0.9203
295
+ F1 Macro: 0.7625
296
+ F1 Weighted: 0.9177
297
+
298
+ Classification Report:
299
+ Class Precision Recall F1-Score Support
300
+ --------------------------------------------------
301
+ CLEAN 0.9514 0.9733 0.9622 33997.0
302
+ OFFENSIVE 0.6284 0.5702 0.5979 2094.0
303
+ HATE 0.7834 0.6791 0.7275 4441.0
304
+ macro avg 0.7877 0.7409 0.7625 40532.0
305
+ weighted avg 0.9163 0.9203 0.9177 40532.0
306
+
307
+ Confusion Matrix:
308
+ [[33090 418 489]
309
+ [ 555 1194 345]
310
+ [ 1137 288 3016]]
311
+
312
+ ================================================================================
313
+
314
+ DETAILED RESULTS - ROBERTA-GRU
315
+ --------------------------------------------------
316
+ Model Path: outputs/hate-speech-detection/roberta-gru
317
+ Number of Samples: 40532
318
+ Accuracy: 0.9537
319
+ F1 Macro: 0.8716
320
+ F1 Weighted: 0.9530
321
+
322
+ Classification Report:
323
+ Class Precision Recall F1-Score Support
324
+ --------------------------------------------------
325
+ CLEAN 0.9711 0.9825 0.9768 33997.0
326
+ OFFENSIVE 0.8136 0.7693 0.7909 2094.0
327
+ HATE 0.8761 0.8201 0.8472 4441.0
328
+ macro avg 0.8870 0.8573 0.8716 40532.0
329
+ weighted avg 0.9526 0.9537 0.9530 40532.0
330
+
331
+ Confusion Matrix:
332
+ [[33402 237 358]
333
+ [ 326 1611 157]
334
+ [ 667 132 3642]]
335
+
336
+ ================================================================================
337
+
338
+ DETAILED RESULTS - BILSTM
339
+ --------------------------------------------------
340
+ Model Path: outputs/hate-speech-detection/bilstm
341
+ Number of Samples: 40532
342
+ Accuracy: 0.8388
343
+ F1 Macro: 0.3041
344
+ F1 Weighted: 0.7652
345
+
346
+ Classification Report:
347
+ Class Precision Recall F1-Score Support
348
+ --------------------------------------------------
349
+ CLEAN 0.8388 1.0000 0.9123 33997.0
350
+ OFFENSIVE 0.0000 0.0000 0.0000 2094.0
351
+ HATE 0.0000 0.0000 0.0000 4441.0
352
+ macro avg 0.2796 0.3333 0.3041 40532.0
353
+ weighted avg 0.7035 0.8388 0.7652 40532.0
354
+
355
+ Confusion Matrix:
356
+ [[33997 0 0]
357
+ [ 2094 0 0]
358
+ [ 4441 0 0]]
359
+
360
+ ================================================================================
361
+
362
+ DETAILED RESULTS - TEXTCNN
363
+ --------------------------------------------------
364
+ Model Path: outputs/hate-speech-detection/textcnn
365
+ Number of Samples: 40532
366
+ Accuracy: 0.8388
367
+ F1 Macro: 0.3041
368
+ F1 Weighted: 0.7652
369
+
370
+ Classification Report:
371
+ Class Precision Recall F1-Score Support
372
+ --------------------------------------------------
373
+ CLEAN 0.8388 1.0000 0.9123 33997.0
374
+ OFFENSIVE 0.0000 0.0000 0.0000 2094.0
375
+ HATE 0.0000 0.0000 0.0000 4441.0
376
+ macro avg 0.2796 0.3333 0.3041 40532.0
377
+ weighted avg 0.7035 0.8388 0.7652 40532.0
378
+
379
+ Confusion Matrix:
380
+ [[33997 0 0]
381
+ [ 2094 0 0]
382
+ [ 4441 0 0]]
383
+
384
+ ================================================================================
385
+
386
+ DETAILED RESULTS - MBERT
387
+ --------------------------------------------------
388
+ Model Path: outputs/hate-speech-detection/mbert
389
+ Number of Samples: 40532
390
+ Accuracy: 0.9360
391
+ F1 Macro: 0.8044
392
+ F1 Weighted: 0.9317
393
+
394
+ Classification Report:
395
+ Class Precision Recall F1-Score Support
396
+ --------------------------------------------------
397
+ CLEAN 0.9489 0.9876 0.9679 33997.0
398
+ OFFENSIVE 0.8645 0.5392 0.6641 2094.0
399
+ HATE 0.8416 0.7287 0.7811 4441.0
400
+ macro avg 0.8850 0.7518 0.8044 40532.0
401
+ weighted avg 0.9328 0.9360 0.9317 40532.0
402
+
403
+ Confusion Matrix:
404
+ [[33574 93 330]
405
+ [ 686 1129 279]
406
+ [ 1121 84 3236]]
407
+
408
+ ================================================================================
409
+
410
+ DETAILED RESULTS - SPHOBERT
411
+ --------------------------------------------------
412
+ Model Path: outputs/hate-speech-detection/sphobert
413
+ Number of Samples: 40532
414
+ Accuracy: 0.9143
415
+ F1 Macro: 0.7378
416
+ F1 Weighted: 0.9096
417
+
418
+ Classification Report:
419
+ Class Precision Recall F1-Score Support
420
+ --------------------------------------------------
421
+ CLEAN 0.9434 0.9729 0.9579 33997.0
422
+ OFFENSIVE 0.6821 0.4508 0.5428 2094.0
423
+ HATE 0.7436 0.6843 0.7127 4441.0
424
+ macro avg 0.7897 0.7027 0.7378 40532.0
425
+ weighted avg 0.9080 0.9143 0.9096 40532.0
426
+
427
+ Confusion Matrix:
428
+ [[33077 253 667]
429
+ [ 769 944 381]
430
+ [ 1215 187 3039]]
431
+
432
+ ================================================================================
433
+
434
+
435
+ ============================================================
436
+ EVALUATION COMPLETED!
437
+ ============================================================
438
+ Successfully evaluated: 11/11 models
439
+
440
+ Best performing models:
441
+ 1. vihate-t5: Accuracy=0.9551, F1=0.8718
442
+ 2. roberta-gru: Accuracy=0.9537, F1=0.8716
443
+ 3. phobert-v1: Accuracy=0.9421, F1=0.8308