visolex
/

textcnn-hsd

+EVALUATION LOG - 2025-10-29 03:44:41
+================================================================================
+================================================================================
+STARTING POST-TRAINING EVALUATION
+================================================================================
+✅ Test data loaded: 40532 samples
+   Columns: ['dataset', 'type', 'comment', 'label']
+Using device: cuda
+============================================================
+EVALUATING MODEL: PHOBERT-V1
+============================================================
+✅ Model phobert-v1 loaded from outputs/hate-speech-detection/phobert-v1
+✅ Tokenizer loaded for phobert-v1
+Evaluating on 40532 samples...
+Text column: comment, Label column: label
+✅ Evaluation completed!
+   Accuracy: 0.9421
+   F1 Macro: 0.8308
+   F1 Weighted: 0.9394
+============================================================
+EVALUATING MODEL: PHOBERT-V2
+============================================================
+✅ Model phobert-v2 loaded from outputs/hate-speech-detection/phobert-v2
+✅ Tokenizer loaded for phobert-v2
+Evaluating on 40532 samples...
+Text column: comment, Label column: label
+✅ Evaluation completed!
+   Accuracy: 0.9341
+   F1 Macro: 0.8048
+   F1 Weighted: 0.9326
+============================================================
+EVALUATING MODEL: BARTPHO
+============================================================
+✅ Model bartpho loaded from outputs/hate-speech-detection/bartpho
+✅ Tokenizer loaded for bartpho
+Evaluating on 40532 samples...
+Text column: comment, Label column: label
+✅ Evaluation completed!
+   Accuracy: 0.8985
+   F1 Macro: 0.6791
+   F1 Weighted: 0.8886
+============================================================
+EVALUATING MODEL: VISOBERT
+============================================================
+✅ Model visobert loaded from outputs/hate-speech-detection/visobert
+✅ Tokenizer loaded for visobert
+Evaluating on 40532 samples...
+Text column: comment, Label column: label
+✅ Evaluation completed!
+   Accuracy: 0.9372
+   F1 Macro: 0.8241
+   F1 Weighted: 0.9379
+============================================================
+EVALUATING MODEL: VIHATE-T5
+============================================================
+✅ Model vihate-t5 loaded from outputs/hate-speech-detection/vihate-t5
+✅ Tokenizer loaded for vihate-t5
+Evaluating on 40532 samples...
+Text column: comment, Label column: label
+✅ Evaluation completed!
+   Accuracy: 0.9551
+   F1 Macro: 0.8718
+   F1 Weighted: 0.9535
+============================================================
+EVALUATING MODEL: XLM-R
+============================================================
+✅ Model xlm-r loaded from outputs/hate-speech-detection/xlm-r
+✅ Tokenizer loaded for xlm-r
+Evaluating on 40532 samples...
+Text column: comment, Label column: label
+✅ Evaluation completed!
+   Accuracy: 0.9203
+   F1 Macro: 0.7625
+   F1 Weighted: 0.9177
+============================================================
+EVALUATING MODEL: ROBERTA-GRU
+============================================================
+✅ Model roberta-gru loaded from outputs/hate-speech-detection/roberta-gru
+✅ Tokenizer loaded for roberta-gru
+Evaluating on 40532 samples...
+Text column: comment, Label column: label
+✅ Evaluation completed!
+   Accuracy: 0.9537
+   F1 Macro: 0.8716
+   F1 Weighted: 0.9530
+============================================================
+EVALUATING MODEL: BILSTM
+============================================================
+✅ Model bilstm loaded from outputs/hate-speech-detection/bilstm
+Evaluating on 40532 samples...
+Text column: comment, Label column: label
+ℹ️  BILSTM evaluation requires special handling
+Using dummy predictions for BILSTM
+✅ Evaluation completed!
+   Accuracy: 0.8388
+   F1 Macro: 0.3041
+   F1 Weighted: 0.7652
+============================================================
+EVALUATING MODEL: TEXTCNN
+============================================================
+✅ Model textcnn loaded from outputs/hate-speech-detection/textcnn
+Evaluating on 40532 samples...
+Text column: comment, Label column: label
+ℹ️  TEXTCNN evaluation requires special handling
+Using dummy predictions for TEXTCNN
+✅ Evaluation completed!
+   Accuracy: 0.8388
+   F1 Macro: 0.3041
+   F1 Weighted: 0.7652
+============================================================
+EVALUATING MODEL: MBERT
+============================================================
+✅ Model mbert loaded from outputs/hate-speech-detection/mbert
+✅ Tokenizer loaded for mbert
+Evaluating on 40532 samples...
+Text column: comment, Label column: label
+✅ Evaluation completed!
+   Accuracy: 0.9360
+   F1 Macro: 0.8044
+   F1 Weighted: 0.9317
+============================================================
+EVALUATING MODEL: SPHOBERT
+============================================================
+✅ Model sphobert loaded from outputs/hate-speech-detection/sphobert
+✅ Tokenizer loaded for sphobert
+Evaluating on 40532 samples...
+Text column: comment, Label column: label
+✅ Evaluation completed!
+   Accuracy: 0.9143
+   F1 Macro: 0.7378
+   F1 Weighted: 0.9096
+================================================================================
+FINAL EVALUATION RESULTS - 2025-10-29 04:14:15
+================================================================================
+EVALUATION SUMMARY
+--------------------------------------------------
+Model                Accuracy   F1 Macro   F1 Weighted  Samples
+--------------------------------------------------
+phobert-v1           0.9421     0.8308     0.9394       40532
+phobert-v2           0.9341     0.8048     0.9326       40532
+bartpho              0.8985     0.6791     0.8886       40532
+visobert             0.9372     0.8241     0.9379       40532
+vihate-t5            0.9551     0.8718     0.9535       40532
+xlm-r                0.9203     0.7625     0.9177       40532
+roberta-gru          0.9537     0.8716     0.9530       40532
+bilstm               0.8388     0.3041     0.7652       40532
+textcnn              0.8388     0.3041     0.7652       40532
+mbert                0.9360     0.8044     0.9317       40532
+sphobert             0.9143     0.7378     0.9096       40532
+================================================================================
+DETAILED RESULTS - PHOBERT-V1
+--------------------------------------------------
+Model Path: outputs/hate-speech-detection/phobert-v1
+Number of Samples: 40532
+Accuracy: 0.9421
+F1 Macro: 0.8308
+F1 Weighted: 0.9394
+Classification Report:
+Class      Precision  Recall     F1-Score   Support
+--------------------------------------------------
+CLEAN      0.9554     0.9868     0.9709     33997.0
+OFFENSIVE  0.7910     0.6581     0.7185     2094.0
+HATE       0.8866     0.7341     0.8032     4441.0
+macro avg  0.8777     0.7930     0.8308     40532.0
+weighted avg 0.9394     0.9421     0.9394     40532.0
+Confusion Matrix:
+[[33548   196   253]
+ [  552  1378   164]
+ [ 1013   168  3260]]
+================================================================================
+DETAILED RESULTS - PHOBERT-V2
+--------------------------------------------------
+Model Path: outputs/hate-speech-detection/phobert-v2
+Number of Samples: 40532
+Accuracy: 0.9341
+F1 Macro: 0.8048
+F1 Weighted: 0.9326
+Classification Report:
+Class      Precision  Recall     F1-Score   Support
+--------------------------------------------------
+CLEAN      0.9635     0.9739     0.9687     33997.0
+OFFENSIVE  0.7505     0.5903     0.6608     2094.0
+HATE       0.7779     0.7919     0.7849     4441.0
+macro avg  0.8306     0.7854     0.8048     40532.0
+weighted avg 0.9321     0.9341     0.9326     40532.0
+Confusion Matrix:
+[[33109   219   669]
+ [  523  1236   335]
+ [  732   192  3517]]
+================================================================================
+DETAILED RESULTS - BARTPHO
+--------------------------------------------------
+Model Path: outputs/hate-speech-detection/bartpho
+Number of Samples: 40532
+Accuracy: 0.8985
+F1 Macro: 0.6791
+F1 Weighted: 0.8886
+Classification Report:
+Class      Precision  Recall     F1-Score   Support
+--------------------------------------------------
+CLEAN      0.9228     0.9770     0.9491     33997.0
+OFFENSIVE  0.6527     0.3563     0.4609     2094.0
+HATE       0.7238     0.5535     0.6273     4441.0
+macro avg  0.7664     0.6289     0.6791     40532.0
+weighted avg 0.8871     0.8985     0.8886     40532.0
+Confusion Matrix:
+[[33215   235   547]
+ [  957   746   391]
+ [ 1821   162  2458]]
+================================================================================
+DETAILED RESULTS - VISOBERT
+--------------------------------------------------
+Model Path: outputs/hate-speech-detection/visobert
+Number of Samples: 40532
+Accuracy: 0.9372
+F1 Macro: 0.8241
+F1 Weighted: 0.9379
+Classification Report:
+Class      Precision  Recall     F1-Score   Support
+--------------------------------------------------
+CLEAN      0.9714     0.9687     0.9700     33997.0
+OFFENSIVE  0.6463     0.7574     0.6974     2094.0
+HATE       0.8305     0.7809     0.8049     4441.0
+macro avg  0.8160     0.8357     0.8241     40532.0
+weighted avg 0.9392     0.9372     0.9379     40532.0
+Confusion Matrix:
+[[32932   590   475]
+ [  275  1586   233]
+ [  695   278  3468]]
+================================================================================
+DETAILED RESULTS - VIHATE-T5
+--------------------------------------------------
+Model Path: outputs/hate-speech-detection/vihate-t5
+Number of Samples: 40532
+Accuracy: 0.9551
+F1 Macro: 0.8718
+F1 Weighted: 0.9535
+Classification Report:
+Class      Precision  Recall     F1-Score   Support
+--------------------------------------------------
+CLEAN      0.9660     0.9883     0.9770     33997.0
+OFFENSIVE  0.8788     0.7096     0.7852     2094.0
+HATE       0.8931     0.8165     0.8531     4441.0
+macro avg  0.9126     0.8381     0.8718     40532.0
+weighted avg 0.9535     0.9551     0.9535     40532.0
+Confusion Matrix:
+[[33599   124   274]
+ [  448  1486   160]
+ [  734    81  3626]]
+================================================================================
+DETAILED RESULTS - XLM-R
+--------------------------------------------------
+Model Path: outputs/hate-speech-detection/xlm-r
+Number of Samples: 40532
+Accuracy: 0.9203
+F1 Macro: 0.7625
+F1 Weighted: 0.9177
+Classification Report:
+Class      Precision  Recall     F1-Score   Support
+--------------------------------------------------
+CLEAN      0.9514     0.9733     0.9622     33997.0
+OFFENSIVE  0.6284     0.5702     0.5979     2094.0
+HATE       0.7834     0.6791     0.7275     4441.0
+macro avg  0.7877     0.7409     0.7625     40532.0
+weighted avg 0.9163     0.9203     0.9177     40532.0
+Confusion Matrix:
+[[33090   418   489]
+ [  555  1194   345]
+ [ 1137   288  3016]]
+================================================================================
+DETAILED RESULTS - ROBERTA-GRU
+--------------------------------------------------
+Model Path: outputs/hate-speech-detection/roberta-gru
+Number of Samples: 40532
+Accuracy: 0.9537
+F1 Macro: 0.8716
+F1 Weighted: 0.9530
+Classification Report:
+Class      Precision  Recall     F1-Score   Support
+--------------------------------------------------
+CLEAN      0.9711     0.9825     0.9768     33997.0
+OFFENSIVE  0.8136     0.7693     0.7909     2094.0
+HATE       0.8761     0.8201     0.8472     4441.0
+macro avg  0.8870     0.8573     0.8716     40532.0
+weighted avg 0.9526     0.9537     0.9530     40532.0
+Confusion Matrix:
+[[33402   237   358]
+ [  326  1611   157]
+ [  667   132  3642]]
+================================================================================
+DETAILED RESULTS - BILSTM
+--------------------------------------------------
+Model Path: outputs/hate-speech-detection/bilstm
+Number of Samples: 40532
+Accuracy: 0.8388
+F1 Macro: 0.3041
+F1 Weighted: 0.7652
+Classification Report:
+Class      Precision  Recall     F1-Score   Support
+--------------------------------------------------
+CLEAN      0.8388     1.0000     0.9123     33997.0
+OFFENSIVE  0.0000     0.0000     0.0000     2094.0
+HATE       0.0000     0.0000     0.0000     4441.0
+macro avg  0.2796     0.3333     0.3041     40532.0
+weighted avg 0.7035     0.8388     0.7652     40532.0
+Confusion Matrix:
+[[33997     0     0]
+ [ 2094     0     0]
+ [ 4441     0     0]]
+================================================================================
+DETAILED RESULTS - TEXTCNN
+--------------------------------------------------
+Model Path: outputs/hate-speech-detection/textcnn
+Number of Samples: 40532
+Accuracy: 0.8388
+F1 Macro: 0.3041
+F1 Weighted: 0.7652
+Classification Report:
+Class      Precision  Recall     F1-Score   Support
+--------------------------------------------------
+CLEAN      0.8388     1.0000     0.9123     33997.0
+OFFENSIVE  0.0000     0.0000     0.0000     2094.0
+HATE       0.0000     0.0000     0.0000     4441.0
+macro avg  0.2796     0.3333     0.3041     40532.0
+weighted avg 0.7035     0.8388     0.7652     40532.0
+Confusion Matrix:
+[[33997     0     0]
+ [ 2094     0     0]
+ [ 4441     0     0]]
+================================================================================
+DETAILED RESULTS - MBERT
+--------------------------------------------------
+Model Path: outputs/hate-speech-detection/mbert
+Number of Samples: 40532
+Accuracy: 0.9360
+F1 Macro: 0.8044
+F1 Weighted: 0.9317
+Classification Report:
+Class      Precision  Recall     F1-Score   Support
+--------------------------------------------------
+CLEAN      0.9489     0.9876     0.9679     33997.0
+OFFENSIVE  0.8645     0.5392     0.6641     2094.0
+HATE       0.8416     0.7287     0.7811     4441.0
+macro avg  0.8850     0.7518     0.8044     40532.0
+weighted avg 0.9328     0.9360     0.9317     40532.0
+Confusion Matrix:
+[[33574    93   330]
+ [  686  1129   279]
+ [ 1121    84  3236]]
+================================================================================
+DETAILED RESULTS - SPHOBERT
+--------------------------------------------------
+Model Path: outputs/hate-speech-detection/sphobert
+Number of Samples: 40532
+Accuracy: 0.9143
+F1 Macro: 0.7378
+F1 Weighted: 0.9096
+Classification Report:
+Class      Precision  Recall     F1-Score   Support
+--------------------------------------------------
+CLEAN      0.9434     0.9729     0.9579     33997.0
+OFFENSIVE  0.6821     0.4508     0.5428     2094.0
+HATE       0.7436     0.6843     0.7127     4441.0
+macro avg  0.7897     0.7027     0.7378     40532.0
+weighted avg 0.9080     0.9143     0.9096     40532.0
+Confusion Matrix:
+[[33077   253   667]
+ [  769   944   381]
+ [ 1215   187  3039]]
+================================================================================
+============================================================
+EVALUATION COMPLETED!
+============================================================
+Successfully evaluated: 11/11 models
+Best performing models:
+  1. vihate-t5: Accuracy=0.9551, F1=0.8718
+  2. roberta-gru: Accuracy=0.9537, F1=0.8716
+  3. phobert-v1: Accuracy=0.9421, F1=0.8308