DiligentPenguinn commited on
Commit
4e0d1cb
·
verified ·
1 Parent(s): b91d0cf

Update evaluation results

Browse files
Files changed (1) hide show
  1. README.md +10 -11
README.md CHANGED
@@ -90,17 +90,16 @@ For multi-label prediction, a uniform decision threshold of 0.3 is applied acros
90
 
91
  ## Results
92
 
93
- On the held-out test set, the MoE classifier consistently outperforms the MLP baseline across all metrics. It achieves:
94
-
95
- - Lower Hamming loss: 0.167 vs 0.235
96
- - Higher ROC-AUC:
97
- - Micro: 0.891 vs 0.827
98
- - Macro: 0.879 vs 0.808
99
- - Higher F1 scores:
100
- - Micro: 0.70 vs 0.61
101
- - Macro: 0.67 vs 0.58
102
-
103
- Per-class F1 improves across all five diagnostic categories, with the largest gains observed for myocardial infarction and hypertrophy. Confusion matrix analysis indicates that the MLP baseline tends to trade precision for recall, producing more false positives and a lower overall F1. For this reason, the MoE classifier is used in the final application.
104
 
105
  ---
106
 
 
90
 
91
  ## Results
92
 
93
+ On the held-out test set, the MoE classifier continues to outperform the MLP baseline on overall metrics. It achieves:
94
+
95
+ Lower Hamming loss: 0.167 vs 0.172
96
+ Higher ROC-AUC:
97
+ Micro: 0.895 vs 0.890
98
+ Macro: 0.878 vs 0.872
99
+ Higher F1 scores:
100
+ Micro: 0.700 vs 0.692
101
+ Macro: 0.661 vs 0.655
102
+ Per-class F1 shows stronger MoE performance on most diagnostic categories, with the largest gain observed for myocardial infarction. Confusion-matrix patterns remain consistent with the MLP baseline tending to trade precision for recall in several labels, which lowers overall F1. For this reason, the MoE classifier is used in the final application.
 
103
 
104
  ---
105