File size: 15,104 Bytes
6dc1658 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 |
EVALUATION LOG - 2025-10-29 03:44:41 ================================================================================ ================================================================================ STARTING POST-TRAINING EVALUATION ================================================================================ β Test data loaded: 40532 samples Columns: ['dataset', 'type', 'comment', 'label'] Using device: cuda ============================================================ EVALUATING MODEL: PHOBERT-V1 ============================================================ β Model phobert-v1 loaded from outputs/hate-speech-detection/phobert-v1 β Tokenizer loaded for phobert-v1 Evaluating on 40532 samples... Text column: comment, Label column: label β Evaluation completed! Accuracy: 0.9421 F1 Macro: 0.8308 F1 Weighted: 0.9394 ============================================================ EVALUATING MODEL: PHOBERT-V2 ============================================================ β Model phobert-v2 loaded from outputs/hate-speech-detection/phobert-v2 β Tokenizer loaded for phobert-v2 Evaluating on 40532 samples... Text column: comment, Label column: label β Evaluation completed! Accuracy: 0.9341 F1 Macro: 0.8048 F1 Weighted: 0.9326 ============================================================ EVALUATING MODEL: BARTPHO ============================================================ β Model bartpho loaded from outputs/hate-speech-detection/bartpho β Tokenizer loaded for bartpho Evaluating on 40532 samples... Text column: comment, Label column: label β Evaluation completed! Accuracy: 0.8985 F1 Macro: 0.6791 F1 Weighted: 0.8886 ============================================================ EVALUATING MODEL: VISOBERT ============================================================ β Model visobert loaded from outputs/hate-speech-detection/visobert β Tokenizer loaded for visobert Evaluating on 40532 samples... Text column: comment, Label column: label β Evaluation completed! Accuracy: 0.9372 F1 Macro: 0.8241 F1 Weighted: 0.9379 ============================================================ EVALUATING MODEL: VIHATE-T5 ============================================================ β Model vihate-t5 loaded from outputs/hate-speech-detection/vihate-t5 β Tokenizer loaded for vihate-t5 Evaluating on 40532 samples... Text column: comment, Label column: label β Evaluation completed! Accuracy: 0.9551 F1 Macro: 0.8718 F1 Weighted: 0.9535 ============================================================ EVALUATING MODEL: XLM-R ============================================================ β Model xlm-r loaded from outputs/hate-speech-detection/xlm-r β Tokenizer loaded for xlm-r Evaluating on 40532 samples... Text column: comment, Label column: label β Evaluation completed! Accuracy: 0.9203 F1 Macro: 0.7625 F1 Weighted: 0.9177 ============================================================ EVALUATING MODEL: ROBERTA-GRU ============================================================ β Model roberta-gru loaded from outputs/hate-speech-detection/roberta-gru β Tokenizer loaded for roberta-gru Evaluating on 40532 samples... Text column: comment, Label column: label β Evaluation completed! Accuracy: 0.9537 F1 Macro: 0.8716 F1 Weighted: 0.9530 ============================================================ EVALUATING MODEL: BILSTM ============================================================ β Model bilstm loaded from outputs/hate-speech-detection/bilstm Evaluating on 40532 samples... Text column: comment, Label column: label βΉοΈ BILSTM evaluation requires special handling Using dummy predictions for BILSTM β Evaluation completed! Accuracy: 0.8388 F1 Macro: 0.3041 F1 Weighted: 0.7652 ============================================================ EVALUATING MODEL: TEXTCNN ============================================================ β Model textcnn loaded from outputs/hate-speech-detection/textcnn Evaluating on 40532 samples... Text column: comment, Label column: label βΉοΈ TEXTCNN evaluation requires special handling Using dummy predictions for TEXTCNN β Evaluation completed! Accuracy: 0.8388 F1 Macro: 0.3041 F1 Weighted: 0.7652 ============================================================ EVALUATING MODEL: MBERT ============================================================ β Model mbert loaded from outputs/hate-speech-detection/mbert β Tokenizer loaded for mbert Evaluating on 40532 samples... Text column: comment, Label column: label β Evaluation completed! Accuracy: 0.9360 F1 Macro: 0.8044 F1 Weighted: 0.9317 ============================================================ EVALUATING MODEL: SPHOBERT ============================================================ β Model sphobert loaded from outputs/hate-speech-detection/sphobert β Tokenizer loaded for sphobert Evaluating on 40532 samples... Text column: comment, Label column: label β Evaluation completed! Accuracy: 0.9143 F1 Macro: 0.7378 F1 Weighted: 0.9096 ================================================================================ FINAL EVALUATION RESULTS - 2025-10-29 04:14:15 ================================================================================ EVALUATION SUMMARY -------------------------------------------------- Model Accuracy F1 Macro F1 Weighted Samples -------------------------------------------------- phobert-v1 0.9421 0.8308 0.9394 40532 phobert-v2 0.9341 0.8048 0.9326 40532 bartpho 0.8985 0.6791 0.8886 40532 visobert 0.9372 0.8241 0.9379 40532 vihate-t5 0.9551 0.8718 0.9535 40532 xlm-r 0.9203 0.7625 0.9177 40532 roberta-gru 0.9537 0.8716 0.9530 40532 bilstm 0.8388 0.3041 0.7652 40532 textcnn 0.8388 0.3041 0.7652 40532 mbert 0.9360 0.8044 0.9317 40532 sphobert 0.9143 0.7378 0.9096 40532 ================================================================================ DETAILED RESULTS - PHOBERT-V1 -------------------------------------------------- Model Path: outputs/hate-speech-detection/phobert-v1 Number of Samples: 40532 Accuracy: 0.9421 F1 Macro: 0.8308 F1 Weighted: 0.9394 Classification Report: Class Precision Recall F1-Score Support -------------------------------------------------- CLEAN 0.9554 0.9868 0.9709 33997.0 OFFENSIVE 0.7910 0.6581 0.7185 2094.0 HATE 0.8866 0.7341 0.8032 4441.0 macro avg 0.8777 0.7930 0.8308 40532.0 weighted avg 0.9394 0.9421 0.9394 40532.0 Confusion Matrix: [[33548 196 253] [ 552 1378 164] [ 1013 168 3260]] ================================================================================ DETAILED RESULTS - PHOBERT-V2 -------------------------------------------------- Model Path: outputs/hate-speech-detection/phobert-v2 Number of Samples: 40532 Accuracy: 0.9341 F1 Macro: 0.8048 F1 Weighted: 0.9326 Classification Report: Class Precision Recall F1-Score Support -------------------------------------------------- CLEAN 0.9635 0.9739 0.9687 33997.0 OFFENSIVE 0.7505 0.5903 0.6608 2094.0 HATE 0.7779 0.7919 0.7849 4441.0 macro avg 0.8306 0.7854 0.8048 40532.0 weighted avg 0.9321 0.9341 0.9326 40532.0 Confusion Matrix: [[33109 219 669] [ 523 1236 335] [ 732 192 3517]] ================================================================================ DETAILED RESULTS - BARTPHO -------------------------------------------------- Model Path: outputs/hate-speech-detection/bartpho Number of Samples: 40532 Accuracy: 0.8985 F1 Macro: 0.6791 F1 Weighted: 0.8886 Classification Report: Class Precision Recall F1-Score Support -------------------------------------------------- CLEAN 0.9228 0.9770 0.9491 33997.0 OFFENSIVE 0.6527 0.3563 0.4609 2094.0 HATE 0.7238 0.5535 0.6273 4441.0 macro avg 0.7664 0.6289 0.6791 40532.0 weighted avg 0.8871 0.8985 0.8886 40532.0 Confusion Matrix: [[33215 235 547] [ 957 746 391] [ 1821 162 2458]] ================================================================================ DETAILED RESULTS - VISOBERT -------------------------------------------------- Model Path: outputs/hate-speech-detection/visobert Number of Samples: 40532 Accuracy: 0.9372 F1 Macro: 0.8241 F1 Weighted: 0.9379 Classification Report: Class Precision Recall F1-Score Support -------------------------------------------------- CLEAN 0.9714 0.9687 0.9700 33997.0 OFFENSIVE 0.6463 0.7574 0.6974 2094.0 HATE 0.8305 0.7809 0.8049 4441.0 macro avg 0.8160 0.8357 0.8241 40532.0 weighted avg 0.9392 0.9372 0.9379 40532.0 Confusion Matrix: [[32932 590 475] [ 275 1586 233] [ 695 278 3468]] ================================================================================ DETAILED RESULTS - VIHATE-T5 -------------------------------------------------- Model Path: outputs/hate-speech-detection/vihate-t5 Number of Samples: 40532 Accuracy: 0.9551 F1 Macro: 0.8718 F1 Weighted: 0.9535 Classification Report: Class Precision Recall F1-Score Support -------------------------------------------------- CLEAN 0.9660 0.9883 0.9770 33997.0 OFFENSIVE 0.8788 0.7096 0.7852 2094.0 HATE 0.8931 0.8165 0.8531 4441.0 macro avg 0.9126 0.8381 0.8718 40532.0 weighted avg 0.9535 0.9551 0.9535 40532.0 Confusion Matrix: [[33599 124 274] [ 448 1486 160] [ 734 81 3626]] ================================================================================ DETAILED RESULTS - XLM-R -------------------------------------------------- Model Path: outputs/hate-speech-detection/xlm-r Number of Samples: 40532 Accuracy: 0.9203 F1 Macro: 0.7625 F1 Weighted: 0.9177 Classification Report: Class Precision Recall F1-Score Support -------------------------------------------------- CLEAN 0.9514 0.9733 0.9622 33997.0 OFFENSIVE 0.6284 0.5702 0.5979 2094.0 HATE 0.7834 0.6791 0.7275 4441.0 macro avg 0.7877 0.7409 0.7625 40532.0 weighted avg 0.9163 0.9203 0.9177 40532.0 Confusion Matrix: [[33090 418 489] [ 555 1194 345] [ 1137 288 3016]] ================================================================================ DETAILED RESULTS - ROBERTA-GRU -------------------------------------------------- Model Path: outputs/hate-speech-detection/roberta-gru Number of Samples: 40532 Accuracy: 0.9537 F1 Macro: 0.8716 F1 Weighted: 0.9530 Classification Report: Class Precision Recall F1-Score Support -------------------------------------------------- CLEAN 0.9711 0.9825 0.9768 33997.0 OFFENSIVE 0.8136 0.7693 0.7909 2094.0 HATE 0.8761 0.8201 0.8472 4441.0 macro avg 0.8870 0.8573 0.8716 40532.0 weighted avg 0.9526 0.9537 0.9530 40532.0 Confusion Matrix: [[33402 237 358] [ 326 1611 157] [ 667 132 3642]] ================================================================================ DETAILED RESULTS - BILSTM -------------------------------------------------- Model Path: outputs/hate-speech-detection/bilstm Number of Samples: 40532 Accuracy: 0.8388 F1 Macro: 0.3041 F1 Weighted: 0.7652 Classification Report: Class Precision Recall F1-Score Support -------------------------------------------------- CLEAN 0.8388 1.0000 0.9123 33997.0 OFFENSIVE 0.0000 0.0000 0.0000 2094.0 HATE 0.0000 0.0000 0.0000 4441.0 macro avg 0.2796 0.3333 0.3041 40532.0 weighted avg 0.7035 0.8388 0.7652 40532.0 Confusion Matrix: [[33997 0 0] [ 2094 0 0] [ 4441 0 0]] ================================================================================ DETAILED RESULTS - TEXTCNN -------------------------------------------------- Model Path: outputs/hate-speech-detection/textcnn Number of Samples: 40532 Accuracy: 0.8388 F1 Macro: 0.3041 F1 Weighted: 0.7652 Classification Report: Class Precision Recall F1-Score Support -------------------------------------------------- CLEAN 0.8388 1.0000 0.9123 33997.0 OFFENSIVE 0.0000 0.0000 0.0000 2094.0 HATE 0.0000 0.0000 0.0000 4441.0 macro avg 0.2796 0.3333 0.3041 40532.0 weighted avg 0.7035 0.8388 0.7652 40532.0 Confusion Matrix: [[33997 0 0] [ 2094 0 0] [ 4441 0 0]] ================================================================================ DETAILED RESULTS - MBERT -------------------------------------------------- Model Path: outputs/hate-speech-detection/mbert Number of Samples: 40532 Accuracy: 0.9360 F1 Macro: 0.8044 F1 Weighted: 0.9317 Classification Report: Class Precision Recall F1-Score Support -------------------------------------------------- CLEAN 0.9489 0.9876 0.9679 33997.0 OFFENSIVE 0.8645 0.5392 0.6641 2094.0 HATE 0.8416 0.7287 0.7811 4441.0 macro avg 0.8850 0.7518 0.8044 40532.0 weighted avg 0.9328 0.9360 0.9317 40532.0 Confusion Matrix: [[33574 93 330] [ 686 1129 279] [ 1121 84 3236]] ================================================================================ DETAILED RESULTS - SPHOBERT -------------------------------------------------- Model Path: outputs/hate-speech-detection/sphobert Number of Samples: 40532 Accuracy: 0.9143 F1 Macro: 0.7378 F1 Weighted: 0.9096 Classification Report: Class Precision Recall F1-Score Support -------------------------------------------------- CLEAN 0.9434 0.9729 0.9579 33997.0 OFFENSIVE 0.6821 0.4508 0.5428 2094.0 HATE 0.7436 0.6843 0.7127 4441.0 macro avg 0.7897 0.7027 0.7378 40532.0 weighted avg 0.9080 0.9143 0.9096 40532.0 Confusion Matrix: [[33077 253 667] [ 769 944 381] [ 1215 187 3039]] ================================================================================ ============================================================ EVALUATION COMPLETED! ============================================================ Successfully evaluated: 11/11 models Best performing models: 1. vihate-t5: Accuracy=0.9551, F1=0.8718 2. roberta-gru: Accuracy=0.9537, F1=0.8716 3. phobert-v1: Accuracy=0.9421, F1=0.8308 |