| EVALUATION LOG - 2025-10-29 03:44:41 | |
| ================================================================================ | |
| ================================================================================ | |
| STARTING POST-TRAINING EVALUATION | |
| ================================================================================ | |
| β Test data loaded: 40532 samples | |
| Columns: ['dataset', 'type', 'comment', 'label'] | |
| Using device: cuda | |
| ============================================================ | |
| EVALUATING MODEL: PHOBERT-V1 | |
| ============================================================ | |
| β Model phobert-v1 loaded from outputs/hate-speech-detection/phobert-v1 | |
| β Tokenizer loaded for phobert-v1 | |
| Evaluating on 40532 samples... | |
| Text column: comment, Label column: label | |
| β Evaluation completed! | |
| Accuracy: 0.9421 | |
| F1 Macro: 0.8308 | |
| F1 Weighted: 0.9394 | |
| ============================================================ | |
| EVALUATING MODEL: PHOBERT-V2 | |
| ============================================================ | |
| β Model phobert-v2 loaded from outputs/hate-speech-detection/phobert-v2 | |
| β Tokenizer loaded for phobert-v2 | |
| Evaluating on 40532 samples... | |
| Text column: comment, Label column: label | |
| β Evaluation completed! | |
| Accuracy: 0.9341 | |
| F1 Macro: 0.8048 | |
| F1 Weighted: 0.9326 | |
| ============================================================ | |
| EVALUATING MODEL: BARTPHO | |
| ============================================================ | |
| β Model bartpho loaded from outputs/hate-speech-detection/bartpho | |
| β Tokenizer loaded for bartpho | |
| Evaluating on 40532 samples... | |
| Text column: comment, Label column: label | |
| β Evaluation completed! | |
| Accuracy: 0.8985 | |
| F1 Macro: 0.6791 | |
| F1 Weighted: 0.8886 | |
| ============================================================ | |
| EVALUATING MODEL: VISOBERT | |
| ============================================================ | |
| β Model visobert loaded from outputs/hate-speech-detection/visobert | |
| β Tokenizer loaded for visobert | |
| Evaluating on 40532 samples... | |
| Text column: comment, Label column: label | |
| β Evaluation completed! | |
| Accuracy: 0.9372 | |
| F1 Macro: 0.8241 | |
| F1 Weighted: 0.9379 | |
| ============================================================ | |
| EVALUATING MODEL: VIHATE-T5 | |
| ============================================================ | |
| β Model vihate-t5 loaded from outputs/hate-speech-detection/vihate-t5 | |
| β Tokenizer loaded for vihate-t5 | |
| Evaluating on 40532 samples... | |
| Text column: comment, Label column: label | |
| β Evaluation completed! | |
| Accuracy: 0.9551 | |
| F1 Macro: 0.8718 | |
| F1 Weighted: 0.9535 | |
| ============================================================ | |
| EVALUATING MODEL: XLM-R | |
| ============================================================ | |
| β Model xlm-r loaded from outputs/hate-speech-detection/xlm-r | |
| β Tokenizer loaded for xlm-r | |
| Evaluating on 40532 samples... | |
| Text column: comment, Label column: label | |
| β Evaluation completed! | |
| Accuracy: 0.9203 | |
| F1 Macro: 0.7625 | |
| F1 Weighted: 0.9177 | |
| ============================================================ | |
| EVALUATING MODEL: ROBERTA-GRU | |
| ============================================================ | |
| β Model roberta-gru loaded from outputs/hate-speech-detection/roberta-gru | |
| β Tokenizer loaded for roberta-gru | |
| Evaluating on 40532 samples... | |
| Text column: comment, Label column: label | |
| β Evaluation completed! | |
| Accuracy: 0.9537 | |
| F1 Macro: 0.8716 | |
| F1 Weighted: 0.9530 | |
| ============================================================ | |
| EVALUATING MODEL: BILSTM | |
| ============================================================ | |
| β Model bilstm loaded from outputs/hate-speech-detection/bilstm | |
| Evaluating on 40532 samples... | |
| Text column: comment, Label column: label | |
| βΉοΈ BILSTM evaluation requires special handling | |
| Using dummy predictions for BILSTM | |
| β Evaluation completed! | |
| Accuracy: 0.8388 | |
| F1 Macro: 0.3041 | |
| F1 Weighted: 0.7652 | |
| ============================================================ | |
| EVALUATING MODEL: TEXTCNN | |
| ============================================================ | |
| β Model textcnn loaded from outputs/hate-speech-detection/textcnn | |
| Evaluating on 40532 samples... | |
| Text column: comment, Label column: label | |
| βΉοΈ TEXTCNN evaluation requires special handling | |
| Using dummy predictions for TEXTCNN | |
| β Evaluation completed! | |
| Accuracy: 0.8388 | |
| F1 Macro: 0.3041 | |
| F1 Weighted: 0.7652 | |
| ============================================================ | |
| EVALUATING MODEL: MBERT | |
| ============================================================ | |
| β Model mbert loaded from outputs/hate-speech-detection/mbert | |
| β Tokenizer loaded for mbert | |
| Evaluating on 40532 samples... | |
| Text column: comment, Label column: label | |
| β Evaluation completed! | |
| Accuracy: 0.9360 | |
| F1 Macro: 0.8044 | |
| F1 Weighted: 0.9317 | |
| ============================================================ | |
| EVALUATING MODEL: SPHOBERT | |
| ============================================================ | |
| β Model sphobert loaded from outputs/hate-speech-detection/sphobert | |
| β Tokenizer loaded for sphobert | |
| Evaluating on 40532 samples... | |
| Text column: comment, Label column: label | |
| β Evaluation completed! | |
| Accuracy: 0.9143 | |
| F1 Macro: 0.7378 | |
| F1 Weighted: 0.9096 | |
| ================================================================================ | |
| FINAL EVALUATION RESULTS - 2025-10-29 04:14:15 | |
| ================================================================================ | |
| EVALUATION SUMMARY | |
| -------------------------------------------------- | |
| Model Accuracy F1 Macro F1 Weighted Samples | |
| -------------------------------------------------- | |
| phobert-v1 0.9421 0.8308 0.9394 40532 | |
| phobert-v2 0.9341 0.8048 0.9326 40532 | |
| bartpho 0.8985 0.6791 0.8886 40532 | |
| visobert 0.9372 0.8241 0.9379 40532 | |
| vihate-t5 0.9551 0.8718 0.9535 40532 | |
| xlm-r 0.9203 0.7625 0.9177 40532 | |
| roberta-gru 0.9537 0.8716 0.9530 40532 | |
| bilstm 0.8388 0.3041 0.7652 40532 | |
| textcnn 0.8388 0.3041 0.7652 40532 | |
| mbert 0.9360 0.8044 0.9317 40532 | |
| sphobert 0.9143 0.7378 0.9096 40532 | |
| ================================================================================ | |
| DETAILED RESULTS - PHOBERT-V1 | |
| -------------------------------------------------- | |
| Model Path: outputs/hate-speech-detection/phobert-v1 | |
| Number of Samples: 40532 | |
| Accuracy: 0.9421 | |
| F1 Macro: 0.8308 | |
| F1 Weighted: 0.9394 | |
| Classification Report: | |
| Class Precision Recall F1-Score Support | |
| -------------------------------------------------- | |
| CLEAN 0.9554 0.9868 0.9709 33997.0 | |
| OFFENSIVE 0.7910 0.6581 0.7185 2094.0 | |
| HATE 0.8866 0.7341 0.8032 4441.0 | |
| macro avg 0.8777 0.7930 0.8308 40532.0 | |
| weighted avg 0.9394 0.9421 0.9394 40532.0 | |
| Confusion Matrix: | |
| [[33548 196 253] | |
| [ 552 1378 164] | |
| [ 1013 168 3260]] | |
| ================================================================================ | |
| DETAILED RESULTS - PHOBERT-V2 | |
| -------------------------------------------------- | |
| Model Path: outputs/hate-speech-detection/phobert-v2 | |
| Number of Samples: 40532 | |
| Accuracy: 0.9341 | |
| F1 Macro: 0.8048 | |
| F1 Weighted: 0.9326 | |
| Classification Report: | |
| Class Precision Recall F1-Score Support | |
| -------------------------------------------------- | |
| CLEAN 0.9635 0.9739 0.9687 33997.0 | |
| OFFENSIVE 0.7505 0.5903 0.6608 2094.0 | |
| HATE 0.7779 0.7919 0.7849 4441.0 | |
| macro avg 0.8306 0.7854 0.8048 40532.0 | |
| weighted avg 0.9321 0.9341 0.9326 40532.0 | |
| Confusion Matrix: | |
| [[33109 219 669] | |
| [ 523 1236 335] | |
| [ 732 192 3517]] | |
| ================================================================================ | |
| DETAILED RESULTS - BARTPHO | |
| -------------------------------------------------- | |
| Model Path: outputs/hate-speech-detection/bartpho | |
| Number of Samples: 40532 | |
| Accuracy: 0.8985 | |
| F1 Macro: 0.6791 | |
| F1 Weighted: 0.8886 | |
| Classification Report: | |
| Class Precision Recall F1-Score Support | |
| -------------------------------------------------- | |
| CLEAN 0.9228 0.9770 0.9491 33997.0 | |
| OFFENSIVE 0.6527 0.3563 0.4609 2094.0 | |
| HATE 0.7238 0.5535 0.6273 4441.0 | |
| macro avg 0.7664 0.6289 0.6791 40532.0 | |
| weighted avg 0.8871 0.8985 0.8886 40532.0 | |
| Confusion Matrix: | |
| [[33215 235 547] | |
| [ 957 746 391] | |
| [ 1821 162 2458]] | |
| ================================================================================ | |
| DETAILED RESULTS - VISOBERT | |
| -------------------------------------------------- | |
| Model Path: outputs/hate-speech-detection/visobert | |
| Number of Samples: 40532 | |
| Accuracy: 0.9372 | |
| F1 Macro: 0.8241 | |
| F1 Weighted: 0.9379 | |
| Classification Report: | |
| Class Precision Recall F1-Score Support | |
| -------------------------------------------------- | |
| CLEAN 0.9714 0.9687 0.9700 33997.0 | |
| OFFENSIVE 0.6463 0.7574 0.6974 2094.0 | |
| HATE 0.8305 0.7809 0.8049 4441.0 | |
| macro avg 0.8160 0.8357 0.8241 40532.0 | |
| weighted avg 0.9392 0.9372 0.9379 40532.0 | |
| Confusion Matrix: | |
| [[32932 590 475] | |
| [ 275 1586 233] | |
| [ 695 278 3468]] | |
| ================================================================================ | |
| DETAILED RESULTS - VIHATE-T5 | |
| -------------------------------------------------- | |
| Model Path: outputs/hate-speech-detection/vihate-t5 | |
| Number of Samples: 40532 | |
| Accuracy: 0.9551 | |
| F1 Macro: 0.8718 | |
| F1 Weighted: 0.9535 | |
| Classification Report: | |
| Class Precision Recall F1-Score Support | |
| -------------------------------------------------- | |
| CLEAN 0.9660 0.9883 0.9770 33997.0 | |
| OFFENSIVE 0.8788 0.7096 0.7852 2094.0 | |
| HATE 0.8931 0.8165 0.8531 4441.0 | |
| macro avg 0.9126 0.8381 0.8718 40532.0 | |
| weighted avg 0.9535 0.9551 0.9535 40532.0 | |
| Confusion Matrix: | |
| [[33599 124 274] | |
| [ 448 1486 160] | |
| [ 734 81 3626]] | |
| ================================================================================ | |
| DETAILED RESULTS - XLM-R | |
| -------------------------------------------------- | |
| Model Path: outputs/hate-speech-detection/xlm-r | |
| Number of Samples: 40532 | |
| Accuracy: 0.9203 | |
| F1 Macro: 0.7625 | |
| F1 Weighted: 0.9177 | |
| Classification Report: | |
| Class Precision Recall F1-Score Support | |
| -------------------------------------------------- | |
| CLEAN 0.9514 0.9733 0.9622 33997.0 | |
| OFFENSIVE 0.6284 0.5702 0.5979 2094.0 | |
| HATE 0.7834 0.6791 0.7275 4441.0 | |
| macro avg 0.7877 0.7409 0.7625 40532.0 | |
| weighted avg 0.9163 0.9203 0.9177 40532.0 | |
| Confusion Matrix: | |
| [[33090 418 489] | |
| [ 555 1194 345] | |
| [ 1137 288 3016]] | |
| ================================================================================ | |
| DETAILED RESULTS - ROBERTA-GRU | |
| -------------------------------------------------- | |
| Model Path: outputs/hate-speech-detection/roberta-gru | |
| Number of Samples: 40532 | |
| Accuracy: 0.9537 | |
| F1 Macro: 0.8716 | |
| F1 Weighted: 0.9530 | |
| Classification Report: | |
| Class Precision Recall F1-Score Support | |
| -------------------------------------------------- | |
| CLEAN 0.9711 0.9825 0.9768 33997.0 | |
| OFFENSIVE 0.8136 0.7693 0.7909 2094.0 | |
| HATE 0.8761 0.8201 0.8472 4441.0 | |
| macro avg 0.8870 0.8573 0.8716 40532.0 | |
| weighted avg 0.9526 0.9537 0.9530 40532.0 | |
| Confusion Matrix: | |
| [[33402 237 358] | |
| [ 326 1611 157] | |
| [ 667 132 3642]] | |
| ================================================================================ | |
| DETAILED RESULTS - BILSTM | |
| -------------------------------------------------- | |
| Model Path: outputs/hate-speech-detection/bilstm | |
| Number of Samples: 40532 | |
| Accuracy: 0.8388 | |
| F1 Macro: 0.3041 | |
| F1 Weighted: 0.7652 | |
| Classification Report: | |
| Class Precision Recall F1-Score Support | |
| -------------------------------------------------- | |
| CLEAN 0.8388 1.0000 0.9123 33997.0 | |
| OFFENSIVE 0.0000 0.0000 0.0000 2094.0 | |
| HATE 0.0000 0.0000 0.0000 4441.0 | |
| macro avg 0.2796 0.3333 0.3041 40532.0 | |
| weighted avg 0.7035 0.8388 0.7652 40532.0 | |
| Confusion Matrix: | |
| [[33997 0 0] | |
| [ 2094 0 0] | |
| [ 4441 0 0]] | |
| ================================================================================ | |
| DETAILED RESULTS - TEXTCNN | |
| -------------------------------------------------- | |
| Model Path: outputs/hate-speech-detection/textcnn | |
| Number of Samples: 40532 | |
| Accuracy: 0.8388 | |
| F1 Macro: 0.3041 | |
| F1 Weighted: 0.7652 | |
| Classification Report: | |
| Class Precision Recall F1-Score Support | |
| -------------------------------------------------- | |
| CLEAN 0.8388 1.0000 0.9123 33997.0 | |
| OFFENSIVE 0.0000 0.0000 0.0000 2094.0 | |
| HATE 0.0000 0.0000 0.0000 4441.0 | |
| macro avg 0.2796 0.3333 0.3041 40532.0 | |
| weighted avg 0.7035 0.8388 0.7652 40532.0 | |
| Confusion Matrix: | |
| [[33997 0 0] | |
| [ 2094 0 0] | |
| [ 4441 0 0]] | |
| ================================================================================ | |
| DETAILED RESULTS - MBERT | |
| -------------------------------------------------- | |
| Model Path: outputs/hate-speech-detection/mbert | |
| Number of Samples: 40532 | |
| Accuracy: 0.9360 | |
| F1 Macro: 0.8044 | |
| F1 Weighted: 0.9317 | |
| Classification Report: | |
| Class Precision Recall F1-Score Support | |
| -------------------------------------------------- | |
| CLEAN 0.9489 0.9876 0.9679 33997.0 | |
| OFFENSIVE 0.8645 0.5392 0.6641 2094.0 | |
| HATE 0.8416 0.7287 0.7811 4441.0 | |
| macro avg 0.8850 0.7518 0.8044 40532.0 | |
| weighted avg 0.9328 0.9360 0.9317 40532.0 | |
| Confusion Matrix: | |
| [[33574 93 330] | |
| [ 686 1129 279] | |
| [ 1121 84 3236]] | |
| ================================================================================ | |
| DETAILED RESULTS - SPHOBERT | |
| -------------------------------------------------- | |
| Model Path: outputs/hate-speech-detection/sphobert | |
| Number of Samples: 40532 | |
| Accuracy: 0.9143 | |
| F1 Macro: 0.7378 | |
| F1 Weighted: 0.9096 | |
| Classification Report: | |
| Class Precision Recall F1-Score Support | |
| -------------------------------------------------- | |
| CLEAN 0.9434 0.9729 0.9579 33997.0 | |
| OFFENSIVE 0.6821 0.4508 0.5428 2094.0 | |
| HATE 0.7436 0.6843 0.7127 4441.0 | |
| macro avg 0.7897 0.7027 0.7378 40532.0 | |
| weighted avg 0.9080 0.9143 0.9096 40532.0 | |
| Confusion Matrix: | |
| [[33077 253 667] | |
| [ 769 944 381] | |
| [ 1215 187 3039]] | |
| ================================================================================ | |
| ============================================================ | |
| EVALUATION COMPLETED! | |
| ============================================================ | |
| Successfully evaluated: 11/11 models | |
| Best performing models: | |
| 1. vihate-t5: Accuracy=0.9551, F1=0.8718 | |
| 2. roberta-gru: Accuracy=0.9537, F1=0.8716 | |
| 3. phobert-v1: Accuracy=0.9421, F1=0.8308 | |