AnnyNguyen commited on
Commit
5c3fb6f
·
verified ·
1 Parent(s): 67f1fc7

Delete evaluation_log_bartpho.txt with huggingface_hub

Browse files
Files changed (1) hide show
  1. evaluation_log_bartpho.txt +0 -443
evaluation_log_bartpho.txt DELETED
@@ -1,443 +0,0 @@
1
- EVALUATION LOG - 2025-10-29 03:44:41
2
- ================================================================================
3
-
4
-
5
-
6
- ================================================================================
7
- STARTING POST-TRAINING EVALUATION
8
- ================================================================================
9
- ✅ Test data loaded: 40532 samples
10
- Columns: ['dataset', 'type', 'comment', 'label']
11
- Using device: cuda
12
-
13
- ============================================================
14
- EVALUATING MODEL: PHOBERT-V1
15
- ============================================================
16
- ✅ Model phobert-v1 loaded from outputs/hate-speech-detection/phobert-v1
17
- ✅ Tokenizer loaded for phobert-v1
18
- Evaluating on 40532 samples...
19
- Text column: comment, Label column: label
20
- ✅ Evaluation completed!
21
- Accuracy: 0.9421
22
- F1 Macro: 0.8308
23
- F1 Weighted: 0.9394
24
-
25
- ============================================================
26
- EVALUATING MODEL: PHOBERT-V2
27
- ============================================================
28
- ✅ Model phobert-v2 loaded from outputs/hate-speech-detection/phobert-v2
29
- ✅ Tokenizer loaded for phobert-v2
30
- Evaluating on 40532 samples...
31
- Text column: comment, Label column: label
32
- ✅ Evaluation completed!
33
- Accuracy: 0.9341
34
- F1 Macro: 0.8048
35
- F1 Weighted: 0.9326
36
-
37
- ============================================================
38
- EVALUATING MODEL: BARTPHO
39
- ============================================================
40
- ✅ Model bartpho loaded from outputs/hate-speech-detection/bartpho
41
- ✅ Tokenizer loaded for bartpho
42
- Evaluating on 40532 samples...
43
- Text column: comment, Label column: label
44
- ✅ Evaluation completed!
45
- Accuracy: 0.8985
46
- F1 Macro: 0.6791
47
- F1 Weighted: 0.8886
48
-
49
- ============================================================
50
- EVALUATING MODEL: VISOBERT
51
- ============================================================
52
- ✅ Model visobert loaded from outputs/hate-speech-detection/visobert
53
- ✅ Tokenizer loaded for visobert
54
- Evaluating on 40532 samples...
55
- Text column: comment, Label column: label
56
- ✅ Evaluation completed!
57
- Accuracy: 0.9372
58
- F1 Macro: 0.8241
59
- F1 Weighted: 0.9379
60
-
61
- ============================================================
62
- EVALUATING MODEL: VIHATE-T5
63
- ============================================================
64
- ✅ Model vihate-t5 loaded from outputs/hate-speech-detection/vihate-t5
65
- ✅ Tokenizer loaded for vihate-t5
66
- Evaluating on 40532 samples...
67
- Text column: comment, Label column: label
68
- ✅ Evaluation completed!
69
- Accuracy: 0.9551
70
- F1 Macro: 0.8718
71
- F1 Weighted: 0.9535
72
-
73
- ============================================================
74
- EVALUATING MODEL: XLM-R
75
- ============================================================
76
- ✅ Model xlm-r loaded from outputs/hate-speech-detection/xlm-r
77
- ✅ Tokenizer loaded for xlm-r
78
- Evaluating on 40532 samples...
79
- Text column: comment, Label column: label
80
- ✅ Evaluation completed!
81
- Accuracy: 0.9203
82
- F1 Macro: 0.7625
83
- F1 Weighted: 0.9177
84
-
85
- ============================================================
86
- EVALUATING MODEL: ROBERTA-GRU
87
- ============================================================
88
- ✅ Model roberta-gru loaded from outputs/hate-speech-detection/roberta-gru
89
- ✅ Tokenizer loaded for roberta-gru
90
- Evaluating on 40532 samples...
91
- Text column: comment, Label column: label
92
- ✅ Evaluation completed!
93
- Accuracy: 0.9537
94
- F1 Macro: 0.8716
95
- F1 Weighted: 0.9530
96
-
97
- ============================================================
98
- EVALUATING MODEL: BILSTM
99
- ============================================================
100
- ✅ Model bilstm loaded from outputs/hate-speech-detection/bilstm
101
- Evaluating on 40532 samples...
102
- Text column: comment, Label column: label
103
- ℹ️ BILSTM evaluation requires special handling
104
- Using dummy predictions for BILSTM
105
- ✅ Evaluation completed!
106
- Accuracy: 0.8388
107
- F1 Macro: 0.3041
108
- F1 Weighted: 0.7652
109
-
110
- ============================================================
111
- EVALUATING MODEL: TEXTCNN
112
- ============================================================
113
- ✅ Model textcnn loaded from outputs/hate-speech-detection/textcnn
114
- Evaluating on 40532 samples...
115
- Text column: comment, Label column: label
116
- ℹ️ TEXTCNN evaluation requires special handling
117
- Using dummy predictions for TEXTCNN
118
- ✅ Evaluation completed!
119
- Accuracy: 0.8388
120
- F1 Macro: 0.3041
121
- F1 Weighted: 0.7652
122
-
123
- ============================================================
124
- EVALUATING MODEL: MBERT
125
- ============================================================
126
- ✅ Model mbert loaded from outputs/hate-speech-detection/mbert
127
- ✅ Tokenizer loaded for mbert
128
- Evaluating on 40532 samples...
129
- Text column: comment, Label column: label
130
- ✅ Evaluation completed!
131
- Accuracy: 0.9360
132
- F1 Macro: 0.8044
133
- F1 Weighted: 0.9317
134
-
135
- ============================================================
136
- EVALUATING MODEL: SPHOBERT
137
- ============================================================
138
- ✅ Model sphobert loaded from outputs/hate-speech-detection/sphobert
139
- ✅ Tokenizer loaded for sphobert
140
- Evaluating on 40532 samples...
141
- Text column: comment, Label column: label
142
- ✅ Evaluation completed!
143
- Accuracy: 0.9143
144
- F1 Macro: 0.7378
145
- F1 Weighted: 0.9096
146
-
147
-
148
- ================================================================================
149
- FINAL EVALUATION RESULTS - 2025-10-29 04:14:15
150
- ================================================================================
151
-
152
- EVALUATION SUMMARY
153
- --------------------------------------------------
154
- Model Accuracy F1 Macro F1 Weighted Samples
155
- --------------------------------------------------
156
- phobert-v1 0.9421 0.8308 0.9394 40532
157
- phobert-v2 0.9341 0.8048 0.9326 40532
158
- bartpho 0.8985 0.6791 0.8886 40532
159
- visobert 0.9372 0.8241 0.9379 40532
160
- vihate-t5 0.9551 0.8718 0.9535 40532
161
- xlm-r 0.9203 0.7625 0.9177 40532
162
- roberta-gru 0.9537 0.8716 0.9530 40532
163
- bilstm 0.8388 0.3041 0.7652 40532
164
- textcnn 0.8388 0.3041 0.7652 40532
165
- mbert 0.9360 0.8044 0.9317 40532
166
- sphobert 0.9143 0.7378 0.9096 40532
167
-
168
- ================================================================================
169
-
170
- DETAILED RESULTS - PHOBERT-V1
171
- --------------------------------------------------
172
- Model Path: outputs/hate-speech-detection/phobert-v1
173
- Number of Samples: 40532
174
- Accuracy: 0.9421
175
- F1 Macro: 0.8308
176
- F1 Weighted: 0.9394
177
-
178
- Classification Report:
179
- Class Precision Recall F1-Score Support
180
- --------------------------------------------------
181
- CLEAN 0.9554 0.9868 0.9709 33997.0
182
- OFFENSIVE 0.7910 0.6581 0.7185 2094.0
183
- HATE 0.8866 0.7341 0.8032 4441.0
184
- macro avg 0.8777 0.7930 0.8308 40532.0
185
- weighted avg 0.9394 0.9421 0.9394 40532.0
186
-
187
- Confusion Matrix:
188
- [[33548 196 253]
189
- [ 552 1378 164]
190
- [ 1013 168 3260]]
191
-
192
- ================================================================================
193
-
194
- DETAILED RESULTS - PHOBERT-V2
195
- --------------------------------------------------
196
- Model Path: outputs/hate-speech-detection/phobert-v2
197
- Number of Samples: 40532
198
- Accuracy: 0.9341
199
- F1 Macro: 0.8048
200
- F1 Weighted: 0.9326
201
-
202
- Classification Report:
203
- Class Precision Recall F1-Score Support
204
- --------------------------------------------------
205
- CLEAN 0.9635 0.9739 0.9687 33997.0
206
- OFFENSIVE 0.7505 0.5903 0.6608 2094.0
207
- HATE 0.7779 0.7919 0.7849 4441.0
208
- macro avg 0.8306 0.7854 0.8048 40532.0
209
- weighted avg 0.9321 0.9341 0.9326 40532.0
210
-
211
- Confusion Matrix:
212
- [[33109 219 669]
213
- [ 523 1236 335]
214
- [ 732 192 3517]]
215
-
216
- ================================================================================
217
-
218
- DETAILED RESULTS - BARTPHO
219
- --------------------------------------------------
220
- Model Path: outputs/hate-speech-detection/bartpho
221
- Number of Samples: 40532
222
- Accuracy: 0.8985
223
- F1 Macro: 0.6791
224
- F1 Weighted: 0.8886
225
-
226
- Classification Report:
227
- Class Precision Recall F1-Score Support
228
- --------------------------------------------------
229
- CLEAN 0.9228 0.9770 0.9491 33997.0
230
- OFFENSIVE 0.6527 0.3563 0.4609 2094.0
231
- HATE 0.7238 0.5535 0.6273 4441.0
232
- macro avg 0.7664 0.6289 0.6791 40532.0
233
- weighted avg 0.8871 0.8985 0.8886 40532.0
234
-
235
- Confusion Matrix:
236
- [[33215 235 547]
237
- [ 957 746 391]
238
- [ 1821 162 2458]]
239
-
240
- ================================================================================
241
-
242
- DETAILED RESULTS - VISOBERT
243
- --------------------------------------------------
244
- Model Path: outputs/hate-speech-detection/visobert
245
- Number of Samples: 40532
246
- Accuracy: 0.9372
247
- F1 Macro: 0.8241
248
- F1 Weighted: 0.9379
249
-
250
- Classification Report:
251
- Class Precision Recall F1-Score Support
252
- --------------------------------------------------
253
- CLEAN 0.9714 0.9687 0.9700 33997.0
254
- OFFENSIVE 0.6463 0.7574 0.6974 2094.0
255
- HATE 0.8305 0.7809 0.8049 4441.0
256
- macro avg 0.8160 0.8357 0.8241 40532.0
257
- weighted avg 0.9392 0.9372 0.9379 40532.0
258
-
259
- Confusion Matrix:
260
- [[32932 590 475]
261
- [ 275 1586 233]
262
- [ 695 278 3468]]
263
-
264
- ================================================================================
265
-
266
- DETAILED RESULTS - VIHATE-T5
267
- --------------------------------------------------
268
- Model Path: outputs/hate-speech-detection/vihate-t5
269
- Number of Samples: 40532
270
- Accuracy: 0.9551
271
- F1 Macro: 0.8718
272
- F1 Weighted: 0.9535
273
-
274
- Classification Report:
275
- Class Precision Recall F1-Score Support
276
- --------------------------------------------------
277
- CLEAN 0.9660 0.9883 0.9770 33997.0
278
- OFFENSIVE 0.8788 0.7096 0.7852 2094.0
279
- HATE 0.8931 0.8165 0.8531 4441.0
280
- macro avg 0.9126 0.8381 0.8718 40532.0
281
- weighted avg 0.9535 0.9551 0.9535 40532.0
282
-
283
- Confusion Matrix:
284
- [[33599 124 274]
285
- [ 448 1486 160]
286
- [ 734 81 3626]]
287
-
288
- ================================================================================
289
-
290
- DETAILED RESULTS - XLM-R
291
- --------------------------------------------------
292
- Model Path: outputs/hate-speech-detection/xlm-r
293
- Number of Samples: 40532
294
- Accuracy: 0.9203
295
- F1 Macro: 0.7625
296
- F1 Weighted: 0.9177
297
-
298
- Classification Report:
299
- Class Precision Recall F1-Score Support
300
- --------------------------------------------------
301
- CLEAN 0.9514 0.9733 0.9622 33997.0
302
- OFFENSIVE 0.6284 0.5702 0.5979 2094.0
303
- HATE 0.7834 0.6791 0.7275 4441.0
304
- macro avg 0.7877 0.7409 0.7625 40532.0
305
- weighted avg 0.9163 0.9203 0.9177 40532.0
306
-
307
- Confusion Matrix:
308
- [[33090 418 489]
309
- [ 555 1194 345]
310
- [ 1137 288 3016]]
311
-
312
- ================================================================================
313
-
314
- DETAILED RESULTS - ROBERTA-GRU
315
- --------------------------------------------------
316
- Model Path: outputs/hate-speech-detection/roberta-gru
317
- Number of Samples: 40532
318
- Accuracy: 0.9537
319
- F1 Macro: 0.8716
320
- F1 Weighted: 0.9530
321
-
322
- Classification Report:
323
- Class Precision Recall F1-Score Support
324
- --------------------------------------------------
325
- CLEAN 0.9711 0.9825 0.9768 33997.0
326
- OFFENSIVE 0.8136 0.7693 0.7909 2094.0
327
- HATE 0.8761 0.8201 0.8472 4441.0
328
- macro avg 0.8870 0.8573 0.8716 40532.0
329
- weighted avg 0.9526 0.9537 0.9530 40532.0
330
-
331
- Confusion Matrix:
332
- [[33402 237 358]
333
- [ 326 1611 157]
334
- [ 667 132 3642]]
335
-
336
- ================================================================================
337
-
338
- DETAILED RESULTS - BILSTM
339
- --------------------------------------------------
340
- Model Path: outputs/hate-speech-detection/bilstm
341
- Number of Samples: 40532
342
- Accuracy: 0.8388
343
- F1 Macro: 0.3041
344
- F1 Weighted: 0.7652
345
-
346
- Classification Report:
347
- Class Precision Recall F1-Score Support
348
- --------------------------------------------------
349
- CLEAN 0.8388 1.0000 0.9123 33997.0
350
- OFFENSIVE 0.0000 0.0000 0.0000 2094.0
351
- HATE 0.0000 0.0000 0.0000 4441.0
352
- macro avg 0.2796 0.3333 0.3041 40532.0
353
- weighted avg 0.7035 0.8388 0.7652 40532.0
354
-
355
- Confusion Matrix:
356
- [[33997 0 0]
357
- [ 2094 0 0]
358
- [ 4441 0 0]]
359
-
360
- ================================================================================
361
-
362
- DETAILED RESULTS - TEXTCNN
363
- --------------------------------------------------
364
- Model Path: outputs/hate-speech-detection/textcnn
365
- Number of Samples: 40532
366
- Accuracy: 0.8388
367
- F1 Macro: 0.3041
368
- F1 Weighted: 0.7652
369
-
370
- Classification Report:
371
- Class Precision Recall F1-Score Support
372
- --------------------------------------------------
373
- CLEAN 0.8388 1.0000 0.9123 33997.0
374
- OFFENSIVE 0.0000 0.0000 0.0000 2094.0
375
- HATE 0.0000 0.0000 0.0000 4441.0
376
- macro avg 0.2796 0.3333 0.3041 40532.0
377
- weighted avg 0.7035 0.8388 0.7652 40532.0
378
-
379
- Confusion Matrix:
380
- [[33997 0 0]
381
- [ 2094 0 0]
382
- [ 4441 0 0]]
383
-
384
- ================================================================================
385
-
386
- DETAILED RESULTS - MBERT
387
- --------------------------------------------------
388
- Model Path: outputs/hate-speech-detection/mbert
389
- Number of Samples: 40532
390
- Accuracy: 0.9360
391
- F1 Macro: 0.8044
392
- F1 Weighted: 0.9317
393
-
394
- Classification Report:
395
- Class Precision Recall F1-Score Support
396
- --------------------------------------------------
397
- CLEAN 0.9489 0.9876 0.9679 33997.0
398
- OFFENSIVE 0.8645 0.5392 0.6641 2094.0
399
- HATE 0.8416 0.7287 0.7811 4441.0
400
- macro avg 0.8850 0.7518 0.8044 40532.0
401
- weighted avg 0.9328 0.9360 0.9317 40532.0
402
-
403
- Confusion Matrix:
404
- [[33574 93 330]
405
- [ 686 1129 279]
406
- [ 1121 84 3236]]
407
-
408
- ================================================================================
409
-
410
- DETAILED RESULTS - SPHOBERT
411
- --------------------------------------------------
412
- Model Path: outputs/hate-speech-detection/sphobert
413
- Number of Samples: 40532
414
- Accuracy: 0.9143
415
- F1 Macro: 0.7378
416
- F1 Weighted: 0.9096
417
-
418
- Classification Report:
419
- Class Precision Recall F1-Score Support
420
- --------------------------------------------------
421
- CLEAN 0.9434 0.9729 0.9579 33997.0
422
- OFFENSIVE 0.6821 0.4508 0.5428 2094.0
423
- HATE 0.7436 0.6843 0.7127 4441.0
424
- macro avg 0.7897 0.7027 0.7378 40532.0
425
- weighted avg 0.9080 0.9143 0.9096 40532.0
426
-
427
- Confusion Matrix:
428
- [[33077 253 667]
429
- [ 769 944 381]
430
- [ 1215 187 3039]]
431
-
432
- ================================================================================
433
-
434
-
435
- ============================================================
436
- EVALUATION COMPLETED!
437
- ============================================================
438
- Successfully evaluated: 11/11 models
439
-
440
- Best performing models:
441
- 1. vihate-t5: Accuracy=0.9551, F1=0.8718
442
- 2. roberta-gru: Accuracy=0.9537, F1=0.8716
443
- 3. phobert-v1: Accuracy=0.9421, F1=0.8308