File size: 15,104 Bytes
6dc1658
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
EVALUATION LOG - 2025-10-29 03:44:41
================================================================================



================================================================================
STARTING POST-TRAINING EVALUATION
================================================================================
βœ… Test data loaded: 40532 samples
   Columns: ['dataset', 'type', 'comment', 'label']
Using device: cuda

============================================================
EVALUATING MODEL: PHOBERT-V1
============================================================
βœ… Model phobert-v1 loaded from outputs/hate-speech-detection/phobert-v1
βœ… Tokenizer loaded for phobert-v1
Evaluating on 40532 samples...
Text column: comment, Label column: label
βœ… Evaluation completed!
   Accuracy: 0.9421
   F1 Macro: 0.8308
   F1 Weighted: 0.9394

============================================================
EVALUATING MODEL: PHOBERT-V2
============================================================
βœ… Model phobert-v2 loaded from outputs/hate-speech-detection/phobert-v2
βœ… Tokenizer loaded for phobert-v2
Evaluating on 40532 samples...
Text column: comment, Label column: label
βœ… Evaluation completed!
   Accuracy: 0.9341
   F1 Macro: 0.8048
   F1 Weighted: 0.9326

============================================================
EVALUATING MODEL: BARTPHO
============================================================
βœ… Model bartpho loaded from outputs/hate-speech-detection/bartpho
βœ… Tokenizer loaded for bartpho
Evaluating on 40532 samples...
Text column: comment, Label column: label
βœ… Evaluation completed!
   Accuracy: 0.8985
   F1 Macro: 0.6791
   F1 Weighted: 0.8886

============================================================
EVALUATING MODEL: VISOBERT
============================================================
βœ… Model visobert loaded from outputs/hate-speech-detection/visobert
βœ… Tokenizer loaded for visobert
Evaluating on 40532 samples...
Text column: comment, Label column: label
βœ… Evaluation completed!
   Accuracy: 0.9372
   F1 Macro: 0.8241
   F1 Weighted: 0.9379

============================================================
EVALUATING MODEL: VIHATE-T5
============================================================
βœ… Model vihate-t5 loaded from outputs/hate-speech-detection/vihate-t5
βœ… Tokenizer loaded for vihate-t5
Evaluating on 40532 samples...
Text column: comment, Label column: label
βœ… Evaluation completed!
   Accuracy: 0.9551
   F1 Macro: 0.8718
   F1 Weighted: 0.9535

============================================================
EVALUATING MODEL: XLM-R
============================================================
βœ… Model xlm-r loaded from outputs/hate-speech-detection/xlm-r
βœ… Tokenizer loaded for xlm-r
Evaluating on 40532 samples...
Text column: comment, Label column: label
βœ… Evaluation completed!
   Accuracy: 0.9203
   F1 Macro: 0.7625
   F1 Weighted: 0.9177

============================================================
EVALUATING MODEL: ROBERTA-GRU
============================================================
βœ… Model roberta-gru loaded from outputs/hate-speech-detection/roberta-gru
βœ… Tokenizer loaded for roberta-gru
Evaluating on 40532 samples...
Text column: comment, Label column: label
βœ… Evaluation completed!
   Accuracy: 0.9537
   F1 Macro: 0.8716
   F1 Weighted: 0.9530

============================================================
EVALUATING MODEL: BILSTM
============================================================
βœ… Model bilstm loaded from outputs/hate-speech-detection/bilstm
Evaluating on 40532 samples...
Text column: comment, Label column: label
ℹ️  BILSTM evaluation requires special handling
Using dummy predictions for BILSTM
βœ… Evaluation completed!
   Accuracy: 0.8388
   F1 Macro: 0.3041
   F1 Weighted: 0.7652

============================================================
EVALUATING MODEL: TEXTCNN
============================================================
βœ… Model textcnn loaded from outputs/hate-speech-detection/textcnn
Evaluating on 40532 samples...
Text column: comment, Label column: label
ℹ️  TEXTCNN evaluation requires special handling
Using dummy predictions for TEXTCNN
βœ… Evaluation completed!
   Accuracy: 0.8388
   F1 Macro: 0.3041
   F1 Weighted: 0.7652

============================================================
EVALUATING MODEL: MBERT
============================================================
βœ… Model mbert loaded from outputs/hate-speech-detection/mbert
βœ… Tokenizer loaded for mbert
Evaluating on 40532 samples...
Text column: comment, Label column: label
βœ… Evaluation completed!
   Accuracy: 0.9360
   F1 Macro: 0.8044
   F1 Weighted: 0.9317

============================================================
EVALUATING MODEL: SPHOBERT
============================================================
βœ… Model sphobert loaded from outputs/hate-speech-detection/sphobert
βœ… Tokenizer loaded for sphobert
Evaluating on 40532 samples...
Text column: comment, Label column: label
βœ… Evaluation completed!
   Accuracy: 0.9143
   F1 Macro: 0.7378
   F1 Weighted: 0.9096


================================================================================
FINAL EVALUATION RESULTS - 2025-10-29 04:14:15
================================================================================

EVALUATION SUMMARY
--------------------------------------------------
Model                Accuracy   F1 Macro   F1 Weighted  Samples 
--------------------------------------------------
phobert-v1           0.9421     0.8308     0.9394       40532   
phobert-v2           0.9341     0.8048     0.9326       40532   
bartpho              0.8985     0.6791     0.8886       40532   
visobert             0.9372     0.8241     0.9379       40532   
vihate-t5            0.9551     0.8718     0.9535       40532   
xlm-r                0.9203     0.7625     0.9177       40532   
roberta-gru          0.9537     0.8716     0.9530       40532   
bilstm               0.8388     0.3041     0.7652       40532   
textcnn              0.8388     0.3041     0.7652       40532   
mbert                0.9360     0.8044     0.9317       40532   
sphobert             0.9143     0.7378     0.9096       40532   

================================================================================

DETAILED RESULTS - PHOBERT-V1
--------------------------------------------------
Model Path: outputs/hate-speech-detection/phobert-v1
Number of Samples: 40532
Accuracy: 0.9421
F1 Macro: 0.8308
F1 Weighted: 0.9394

Classification Report:
Class      Precision  Recall     F1-Score   Support 
--------------------------------------------------
CLEAN      0.9554     0.9868     0.9709     33997.0 
OFFENSIVE  0.7910     0.6581     0.7185     2094.0  
HATE       0.8866     0.7341     0.8032     4441.0  
macro avg  0.8777     0.7930     0.8308     40532.0 
weighted avg 0.9394     0.9421     0.9394     40532.0 

Confusion Matrix:
[[33548   196   253]
 [  552  1378   164]
 [ 1013   168  3260]]

================================================================================

DETAILED RESULTS - PHOBERT-V2
--------------------------------------------------
Model Path: outputs/hate-speech-detection/phobert-v2
Number of Samples: 40532
Accuracy: 0.9341
F1 Macro: 0.8048
F1 Weighted: 0.9326

Classification Report:
Class      Precision  Recall     F1-Score   Support 
--------------------------------------------------
CLEAN      0.9635     0.9739     0.9687     33997.0 
OFFENSIVE  0.7505     0.5903     0.6608     2094.0  
HATE       0.7779     0.7919     0.7849     4441.0  
macro avg  0.8306     0.7854     0.8048     40532.0 
weighted avg 0.9321     0.9341     0.9326     40532.0 

Confusion Matrix:
[[33109   219   669]
 [  523  1236   335]
 [  732   192  3517]]

================================================================================

DETAILED RESULTS - BARTPHO
--------------------------------------------------
Model Path: outputs/hate-speech-detection/bartpho
Number of Samples: 40532
Accuracy: 0.8985
F1 Macro: 0.6791
F1 Weighted: 0.8886

Classification Report:
Class      Precision  Recall     F1-Score   Support 
--------------------------------------------------
CLEAN      0.9228     0.9770     0.9491     33997.0 
OFFENSIVE  0.6527     0.3563     0.4609     2094.0  
HATE       0.7238     0.5535     0.6273     4441.0  
macro avg  0.7664     0.6289     0.6791     40532.0 
weighted avg 0.8871     0.8985     0.8886     40532.0 

Confusion Matrix:
[[33215   235   547]
 [  957   746   391]
 [ 1821   162  2458]]

================================================================================

DETAILED RESULTS - VISOBERT
--------------------------------------------------
Model Path: outputs/hate-speech-detection/visobert
Number of Samples: 40532
Accuracy: 0.9372
F1 Macro: 0.8241
F1 Weighted: 0.9379

Classification Report:
Class      Precision  Recall     F1-Score   Support 
--------------------------------------------------
CLEAN      0.9714     0.9687     0.9700     33997.0 
OFFENSIVE  0.6463     0.7574     0.6974     2094.0  
HATE       0.8305     0.7809     0.8049     4441.0  
macro avg  0.8160     0.8357     0.8241     40532.0 
weighted avg 0.9392     0.9372     0.9379     40532.0 

Confusion Matrix:
[[32932   590   475]
 [  275  1586   233]
 [  695   278  3468]]

================================================================================

DETAILED RESULTS - VIHATE-T5
--------------------------------------------------
Model Path: outputs/hate-speech-detection/vihate-t5
Number of Samples: 40532
Accuracy: 0.9551
F1 Macro: 0.8718
F1 Weighted: 0.9535

Classification Report:
Class      Precision  Recall     F1-Score   Support 
--------------------------------------------------
CLEAN      0.9660     0.9883     0.9770     33997.0 
OFFENSIVE  0.8788     0.7096     0.7852     2094.0  
HATE       0.8931     0.8165     0.8531     4441.0  
macro avg  0.9126     0.8381     0.8718     40532.0 
weighted avg 0.9535     0.9551     0.9535     40532.0 

Confusion Matrix:
[[33599   124   274]
 [  448  1486   160]
 [  734    81  3626]]

================================================================================

DETAILED RESULTS - XLM-R
--------------------------------------------------
Model Path: outputs/hate-speech-detection/xlm-r
Number of Samples: 40532
Accuracy: 0.9203
F1 Macro: 0.7625
F1 Weighted: 0.9177

Classification Report:
Class      Precision  Recall     F1-Score   Support 
--------------------------------------------------
CLEAN      0.9514     0.9733     0.9622     33997.0 
OFFENSIVE  0.6284     0.5702     0.5979     2094.0  
HATE       0.7834     0.6791     0.7275     4441.0  
macro avg  0.7877     0.7409     0.7625     40532.0 
weighted avg 0.9163     0.9203     0.9177     40532.0 

Confusion Matrix:
[[33090   418   489]
 [  555  1194   345]
 [ 1137   288  3016]]

================================================================================

DETAILED RESULTS - ROBERTA-GRU
--------------------------------------------------
Model Path: outputs/hate-speech-detection/roberta-gru
Number of Samples: 40532
Accuracy: 0.9537
F1 Macro: 0.8716
F1 Weighted: 0.9530

Classification Report:
Class      Precision  Recall     F1-Score   Support 
--------------------------------------------------
CLEAN      0.9711     0.9825     0.9768     33997.0 
OFFENSIVE  0.8136     0.7693     0.7909     2094.0  
HATE       0.8761     0.8201     0.8472     4441.0  
macro avg  0.8870     0.8573     0.8716     40532.0 
weighted avg 0.9526     0.9537     0.9530     40532.0 

Confusion Matrix:
[[33402   237   358]
 [  326  1611   157]
 [  667   132  3642]]

================================================================================

DETAILED RESULTS - BILSTM
--------------------------------------------------
Model Path: outputs/hate-speech-detection/bilstm
Number of Samples: 40532
Accuracy: 0.8388
F1 Macro: 0.3041
F1 Weighted: 0.7652

Classification Report:
Class      Precision  Recall     F1-Score   Support 
--------------------------------------------------
CLEAN      0.8388     1.0000     0.9123     33997.0 
OFFENSIVE  0.0000     0.0000     0.0000     2094.0  
HATE       0.0000     0.0000     0.0000     4441.0  
macro avg  0.2796     0.3333     0.3041     40532.0 
weighted avg 0.7035     0.8388     0.7652     40532.0 

Confusion Matrix:
[[33997     0     0]
 [ 2094     0     0]
 [ 4441     0     0]]

================================================================================

DETAILED RESULTS - TEXTCNN
--------------------------------------------------
Model Path: outputs/hate-speech-detection/textcnn
Number of Samples: 40532
Accuracy: 0.8388
F1 Macro: 0.3041
F1 Weighted: 0.7652

Classification Report:
Class      Precision  Recall     F1-Score   Support 
--------------------------------------------------
CLEAN      0.8388     1.0000     0.9123     33997.0 
OFFENSIVE  0.0000     0.0000     0.0000     2094.0  
HATE       0.0000     0.0000     0.0000     4441.0  
macro avg  0.2796     0.3333     0.3041     40532.0 
weighted avg 0.7035     0.8388     0.7652     40532.0 

Confusion Matrix:
[[33997     0     0]
 [ 2094     0     0]
 [ 4441     0     0]]

================================================================================

DETAILED RESULTS - MBERT
--------------------------------------------------
Model Path: outputs/hate-speech-detection/mbert
Number of Samples: 40532
Accuracy: 0.9360
F1 Macro: 0.8044
F1 Weighted: 0.9317

Classification Report:
Class      Precision  Recall     F1-Score   Support 
--------------------------------------------------
CLEAN      0.9489     0.9876     0.9679     33997.0 
OFFENSIVE  0.8645     0.5392     0.6641     2094.0  
HATE       0.8416     0.7287     0.7811     4441.0  
macro avg  0.8850     0.7518     0.8044     40532.0 
weighted avg 0.9328     0.9360     0.9317     40532.0 

Confusion Matrix:
[[33574    93   330]
 [  686  1129   279]
 [ 1121    84  3236]]

================================================================================

DETAILED RESULTS - SPHOBERT
--------------------------------------------------
Model Path: outputs/hate-speech-detection/sphobert
Number of Samples: 40532
Accuracy: 0.9143
F1 Macro: 0.7378
F1 Weighted: 0.9096

Classification Report:
Class      Precision  Recall     F1-Score   Support 
--------------------------------------------------
CLEAN      0.9434     0.9729     0.9579     33997.0 
OFFENSIVE  0.6821     0.4508     0.5428     2094.0  
HATE       0.7436     0.6843     0.7127     4441.0  
macro avg  0.7897     0.7027     0.7378     40532.0 
weighted avg 0.9080     0.9143     0.9096     40532.0 

Confusion Matrix:
[[33077   253   667]
 [  769   944   381]
 [ 1215   187  3039]]

================================================================================


============================================================
EVALUATION COMPLETED!
============================================================
Successfully evaluated: 11/11 models

Best performing models:
  1. vihate-t5: Accuracy=0.9551, F1=0.8718
  2. roberta-gru: Accuracy=0.9537, F1=0.8716
  3. phobert-v1: Accuracy=0.9421, F1=0.8308