Update evaluation results
Browse files
README.md
CHANGED
|
@@ -90,17 +90,16 @@ For multi-label prediction, a uniform decision threshold of 0.3 is applied acros
|
|
| 90 |
|
| 91 |
## Results
|
| 92 |
|
| 93 |
-
On the held-out test set, the MoE classifier
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
Per-class F1 improves across all five diagnostic categories, with the largest gains observed for myocardial infarction and hypertrophy. Confusion matrix analysis indicates that the MLP baseline tends to trade precision for recall, producing more false positives and a lower overall F1. For this reason, the MoE classifier is used in the final application.
|
| 104 |
|
| 105 |
---
|
| 106 |
|
|
|
|
| 90 |
|
| 91 |
## Results
|
| 92 |
|
| 93 |
+
On the held-out test set, the MoE classifier continues to outperform the MLP baseline on overall metrics. It achieves:
|
| 94 |
+
|
| 95 |
+
Lower Hamming loss: 0.167 vs 0.172
|
| 96 |
+
Higher ROC-AUC:
|
| 97 |
+
Micro: 0.895 vs 0.890
|
| 98 |
+
Macro: 0.878 vs 0.872
|
| 99 |
+
Higher F1 scores:
|
| 100 |
+
Micro: 0.700 vs 0.692
|
| 101 |
+
Macro: 0.661 vs 0.655
|
| 102 |
+
Per-class F1 shows stronger MoE performance on most diagnostic categories, with the largest gain observed for myocardial infarction. Confusion-matrix patterns remain consistent with the MLP baseline tending to trade precision for recall in several labels, which lowers overall F1. For this reason, the MoE classifier is used in the final application.
|
|
|
|
| 103 |
|
| 104 |
---
|
| 105 |
|