Upload MIMIC test evaluation results
Browse files
README.md
CHANGED
|
@@ -98,34 +98,43 @@ print(report)
|
|
| 98 |
|
| 99 |
Frontal-only evaluation using `PA/AP` studies only.
|
| 100 |
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
|
| 106 |
-
|
|
| 107 |
-
|
|
| 108 |
-
|
|
| 109 |
-
|
|
| 110 |
-
|
|
| 111 |
-
|
|
| 112 |
-
|
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
|
| 119 |
-
| -
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
|
| 124 |
-
|
|
| 125 |
-
|
|
| 126 |
-
|
|
| 127 |
-
|
|
| 128 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
|
| 130 |
## Data
|
| 131 |
|
|
@@ -138,6 +147,15 @@ The final table will be populated when the planned training run is completed. Un
|
|
| 138 |
|
| 139 |
- Medical report metrics implemented in the repository include RadGraph F1 and CheXpert F1 (`14-micro`, `5-micro`, `14-macro`, `5-macro`).
|
| 140 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
## Training Snapshot
|
| 142 |
|
| 143 |
- Run: `LAnA-v4`
|
|
@@ -173,4 +191,4 @@ The final table will be populated when the planned training run is completed. Un
|
|
| 173 |
|
| 174 |
- Set `HF_TOKEN` with permission to access the DINOv3 repositories required by this model before downloading or running inference.
|
| 175 |
- `segmenters/` contains the lung and heart segmentation checkpoints used to build anatomical attention masks.
|
| 176 |
-
- `evaluations/mimic_test_metrics.json` contains the latest saved MIMIC test metrics.
|
|
|
|
| 98 |
|
| 99 |
Frontal-only evaluation using `PA/AP` studies only.
|
| 100 |
|
| 101 |
+
These comparison tables are refreshed across the full LAnA collection whenever any collection model is evaluated.
|
| 102 |
+
|
| 103 |
+
### Cross-Model Comparison: All Frontal Test Studies
|
| 104 |
+
|
| 105 |
+
| Metric | LAnA-MIMIC-CHEXPERT | LAnA-MIMIC | LAnA | LAnA-v2 | LAnA-v3 | LAnA-v4 (Model still training) |
|
| 106 |
+
| --- | --- | --- | --- | --- | --- | --- |
|
| 107 |
+
| Run status | `Completed` | `Completed` | `Completed` | `Completed` | `Completed` | `Model still training` |
|
| 108 |
+
| Number of studies | `3041` | `3041` | `3041` | `3041` | `3041` | `3041` |
|
| 109 |
+
| ROUGE-L | `0.1513` | `0.1653` | `0.1686` | `0.1670` | `0.1745` | `0.1693` |
|
| 110 |
+
| BLEU-1 | `0.1707` | `0.1916` | `0.2091` | `0.2174` | `0.2346` | `0.2266` |
|
| 111 |
+
| BLEU-4 | `0.0357` | `0.0386` | `0.0417` | `0.0417` | `0.0484` | `0.0441` |
|
| 112 |
+
| METEOR | `0.2079` | `0.2202` | `0.2298` | `0.2063` | `0.2129` | `0.2017` |
|
| 113 |
+
| RadGraph F1 | `0.0918` | `0.0921` | `0.1024` | `0.1057` | `0.0939` | `0.0814` |
|
| 114 |
+
| RadGraph entity F1 | `0.1399` | `0.1459` | `0.1587` | `0.1569` | `0.1441` | `0.1453` |
|
| 115 |
+
| RadGraph relation F1 | `0.1246` | `0.1322` | `0.1443` | `0.1474` | `0.1280` | `0.1311` |
|
| 116 |
+
| CheXpert F1 14-micro | `0.1829` | `0.1565` | `0.2116` | `0.1401` | `0.3116` | `0.2211` |
|
| 117 |
+
| CheXpert F1 5-micro | `0.2183` | `0.1530` | `0.2512` | `0.2506` | `0.2486` | `0.0592` |
|
| 118 |
+
| CheXpert F1 14-macro | `0.1095` | `0.0713` | `0.1095` | `0.0401` | `0.1363` | `0.0760` |
|
| 119 |
+
| CheXpert F1 5-macro | `0.1634` | `0.1007` | `0.1644` | `0.1004` | `0.1686` | `0.0354` |
|
| 120 |
+
|
| 121 |
+
### Cross-Model Comparison: Findings-Only Frontal Test Studies
|
| 122 |
+
|
| 123 |
+
| Metric | LAnA-MIMIC-CHEXPERT | LAnA-MIMIC | LAnA | LAnA-v2 | LAnA-v3 | LAnA-v4 (Model still training) |
|
| 124 |
+
| --- | --- | --- | --- | --- | --- | --- |
|
| 125 |
+
| Run status | `Completed` | `Completed` | `Completed` | `Completed` | `Completed` | `Model still training` |
|
| 126 |
+
| Number of studies | `2210` | `2210` | `2210` | `2210` | `2210` | `2210` |
|
| 127 |
+
| ROUGE-L | `0.1576` | `0.1720` | `0.1771` | `0.1771` | `0.1848` | `0.1776` |
|
| 128 |
+
| BLEU-1 | `0.1754` | `0.2003` | `0.2177` | `0.2263` | `0.2480` | `0.2365` |
|
| 129 |
+
| BLEU-4 | `0.0405` | `0.0449` | `0.0484` | `0.0487` | `0.0573` | `0.0509` |
|
| 130 |
+
| METEOR | `0.2207` | `0.2347` | `0.2466` | `0.2240` | `0.2310` | `0.2158` |
|
| 131 |
+
| RadGraph F1 | `0.1010` | `0.1000` | `0.1119` | `0.1181` | `0.1046` | `0.0926` |
|
| 132 |
+
| RadGraph entity F1 | `0.1517` | `0.1577` | `0.1713` | `0.1739` | `0.1584` | `0.1583` |
|
| 133 |
+
| RadGraph relation F1 | `0.1347` | `0.1413` | `0.1549` | `0.1628` | `0.1405` | `0.1428` |
|
| 134 |
+
| CheXpert F1 14-micro | `0.1651` | `0.1442` | `0.1907` | `0.1365` | `0.2921` | `0.2221` |
|
| 135 |
+
| CheXpert F1 5-micro | `0.2152` | `0.1716` | `0.2415` | `0.2455` | `0.2394` | `0.0649` |
|
| 136 |
+
| CheXpert F1 14-macro | `0.1047` | `0.0700` | `0.1039` | `0.0381` | `0.1326` | `0.0758` |
|
| 137 |
+
| CheXpert F1 5-macro | `0.1611` | `0.1112` | `0.1578` | `0.0952` | `0.1636` | `0.0382` |
|
| 138 |
|
| 139 |
## Data
|
| 140 |
|
|
|
|
| 147 |
|
| 148 |
- Medical report metrics implemented in the repository include RadGraph F1 and CheXpert F1 (`14-micro`, `5-micro`, `14-macro`, `5-macro`).
|
| 149 |
|
| 150 |
+
## Experiment Model Descriptions
|
| 151 |
+
|
| 152 |
+
- `LAnA-MIMIC-CHEXPERT`: This variant was trained on a combined dataset of `CheXpert` and `MIMIC-CXR` using LoRA fine-tuning with the `AdamW` optimizer.
|
| 153 |
+
- `LAnA-MIMIC`: This model was trained on the `MIMIC-CXR (findings-only)` dataset using LoRA fine-tuning with the `AdamW` optimizer.
|
| 154 |
+
- `LAnA`: This model was trained on the `MIMIC-CXR (findings-only)` dataset using full-model optimization with `AdamW` instead of LoRA.
|
| 155 |
+
- `LAnA-v2`: This version keeps the same training setup as `LAnA`, but increases the effective global batch size from `16` to `128`.
|
| 156 |
+
- `LAnA-v3`: This version keeps the same training setup as `LAnA`, including the effective global batch size of `16`, but changes how EOS is handled so training and generation follow the same behavior. The model no longer uses the EOS token during training, and generation remained greedy without stopping when an EOS token was produced. In the previous setup, decoding was also greedy, stopped at EOS, and used a maximum of `128` new tokens.
|
| 157 |
+
- `LAnA-v4`: This version keeps the same decoding behavior as `LAnA-v3`, but increases the effective global batch size from `16` to `128`.
|
| 158 |
+
|
| 159 |
## Training Snapshot
|
| 160 |
|
| 161 |
- Run: `LAnA-v4`
|
|
|
|
| 191 |
|
| 192 |
- Set `HF_TOKEN` with permission to access the DINOv3 repositories required by this model before downloading or running inference.
|
| 193 |
- `segmenters/` contains the lung and heart segmentation checkpoints used to build anatomical attention masks.
|
| 194 |
+
- `evaluations/mimic_test_metrics.json` contains the latest saved MIMIC test metrics.
|
evaluations/mimic_test_findings_only_metrics.json
CHANGED
|
@@ -4,19 +4,19 @@
|
|
| 4 |
"dataset": "mimic-cxr",
|
| 5 |
"view_filter": "frontal-only (PA/AP), structured Findings section only",
|
| 6 |
"num_examples": 2210,
|
| 7 |
-
"bleu_1": 0.
|
| 8 |
-
"bleu_4": 0.
|
| 9 |
-
"meteor": 0.
|
| 10 |
-
"rouge_l": 0.
|
| 11 |
-
"chexpert_f1_14_micro": 0.
|
| 12 |
-
"chexpert_f1_5_micro": 0.
|
| 13 |
-
"chexpert_f1_14_macro": 0.
|
| 14 |
-
"chexpert_f1_5_macro": 0.
|
| 15 |
-
"chexpert_f1_micro": 0.
|
| 16 |
-
"chexpert_f1_macro": 0.
|
| 17 |
"chexpert_per_label_f1": {
|
| 18 |
-
"Enlarged Cardiomediastinum": 0.
|
| 19 |
-
"Cardiomegaly": 0.
|
| 20 |
"Lung Opacity": 0.0,
|
| 21 |
"Lung Lesion": 0.0,
|
| 22 |
"Edema": 0.0,
|
|
@@ -27,12 +27,12 @@
|
|
| 27 |
"Pleural Effusion": 0.0,
|
| 28 |
"Pleural Other": 0.0,
|
| 29 |
"Fracture": 0.0,
|
| 30 |
-
"Support Devices": 0.
|
| 31 |
-
"No Finding": 0.
|
| 32 |
},
|
| 33 |
-
"radgraph_f1": 0.
|
| 34 |
-
"radgraph_f1_entity": 0.
|
| 35 |
-
"radgraph_f1_relation": 0.
|
| 36 |
"radgraph_available": true,
|
| 37 |
"radgraph_error": null
|
| 38 |
}
|
|
|
|
| 4 |
"dataset": "mimic-cxr",
|
| 5 |
"view_filter": "frontal-only (PA/AP), structured Findings section only",
|
| 6 |
"num_examples": 2210,
|
| 7 |
+
"bleu_1": 0.2365366145096764,
|
| 8 |
+
"bleu_4": 0.05092983875760064,
|
| 9 |
+
"meteor": 0.21583177087647024,
|
| 10 |
+
"rouge_l": 0.17757622652257565,
|
| 11 |
+
"chexpert_f1_14_micro": 0.22206742825299527,
|
| 12 |
+
"chexpert_f1_5_micro": 0.06491372226787182,
|
| 13 |
+
"chexpert_f1_14_macro": 0.07578923935276514,
|
| 14 |
+
"chexpert_f1_5_macro": 0.03816425120772947,
|
| 15 |
+
"chexpert_f1_micro": 0.22206742825299527,
|
| 16 |
+
"chexpert_f1_macro": 0.07578923935276514,
|
| 17 |
"chexpert_per_label_f1": {
|
| 18 |
+
"Enlarged Cardiomediastinum": 0.028169014084507043,
|
| 19 |
+
"Cardiomegaly": 0.19082125603864733,
|
| 20 |
"Lung Opacity": 0.0,
|
| 21 |
"Lung Lesion": 0.0,
|
| 22 |
"Edema": 0.0,
|
|
|
|
| 27 |
"Pleural Effusion": 0.0,
|
| 28 |
"Pleural Other": 0.0,
|
| 29 |
"Fracture": 0.0,
|
| 30 |
+
"Support Devices": 0.4929681717246484,
|
| 31 |
+
"No Finding": 0.3490909090909091
|
| 32 |
},
|
| 33 |
+
"radgraph_f1": 0.09257418507561303,
|
| 34 |
+
"radgraph_f1_entity": 0.15825019075973226,
|
| 35 |
+
"radgraph_f1_relation": 0.1427606782830153,
|
| 36 |
"radgraph_available": true,
|
| 37 |
"radgraph_error": null
|
| 38 |
}
|
evaluations/mimic_test_findings_only_predictions.csv
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
evaluations/mimic_test_metrics.json
CHANGED
|
@@ -4,19 +4,19 @@
|
|
| 4 |
"dataset": "mimic-cxr",
|
| 5 |
"view_filter": "frontal-only (PA/AP)",
|
| 6 |
"num_examples": 3041,
|
| 7 |
-
"bleu_1": 0.
|
| 8 |
-
"bleu_4": 0.
|
| 9 |
-
"meteor": 0.
|
| 10 |
-
"rouge_l": 0.
|
| 11 |
-
"chexpert_f1_14_micro": 0.
|
| 12 |
-
"chexpert_f1_5_micro": 0.
|
| 13 |
-
"chexpert_f1_14_macro": 0.
|
| 14 |
-
"chexpert_f1_5_macro": 0.
|
| 15 |
-
"chexpert_f1_micro": 0.
|
| 16 |
-
"chexpert_f1_macro": 0.
|
| 17 |
"chexpert_per_label_f1": {
|
| 18 |
-
"Enlarged Cardiomediastinum": 0.
|
| 19 |
-
"Cardiomegaly": 0.
|
| 20 |
"Lung Opacity": 0.0,
|
| 21 |
"Lung Lesion": 0.0,
|
| 22 |
"Edema": 0.0,
|
|
@@ -27,12 +27,12 @@
|
|
| 27 |
"Pleural Effusion": 0.0,
|
| 28 |
"Pleural Other": 0.0,
|
| 29 |
"Fracture": 0.0,
|
| 30 |
-
"Support Devices": 0.
|
| 31 |
-
"No Finding": 0.
|
| 32 |
},
|
| 33 |
-
"radgraph_f1": 0.
|
| 34 |
-
"radgraph_f1_entity": 0.
|
| 35 |
-
"radgraph_f1_relation": 0.
|
| 36 |
"radgraph_available": true,
|
| 37 |
"radgraph_error": null,
|
| 38 |
"evaluation_suite": "mimic_test_dual",
|
|
@@ -42,19 +42,19 @@
|
|
| 42 |
"dataset": "mimic-cxr",
|
| 43 |
"view_filter": "frontal-only (PA/AP)",
|
| 44 |
"num_examples": 3041,
|
| 45 |
-
"bleu_1": 0.
|
| 46 |
-
"bleu_4": 0.
|
| 47 |
-
"meteor": 0.
|
| 48 |
-
"rouge_l": 0.
|
| 49 |
-
"chexpert_f1_14_micro": 0.
|
| 50 |
-
"chexpert_f1_5_micro": 0.
|
| 51 |
-
"chexpert_f1_14_macro": 0.
|
| 52 |
-
"chexpert_f1_5_macro": 0.
|
| 53 |
-
"chexpert_f1_micro": 0.
|
| 54 |
-
"chexpert_f1_macro": 0.
|
| 55 |
"chexpert_per_label_f1": {
|
| 56 |
-
"Enlarged Cardiomediastinum": 0.
|
| 57 |
-
"Cardiomegaly": 0.
|
| 58 |
"Lung Opacity": 0.0,
|
| 59 |
"Lung Lesion": 0.0,
|
| 60 |
"Edema": 0.0,
|
|
@@ -65,12 +65,12 @@
|
|
| 65 |
"Pleural Effusion": 0.0,
|
| 66 |
"Pleural Other": 0.0,
|
| 67 |
"Fracture": 0.0,
|
| 68 |
-
"Support Devices": 0.
|
| 69 |
-
"No Finding": 0.
|
| 70 |
},
|
| 71 |
-
"radgraph_f1": 0.
|
| 72 |
-
"radgraph_f1_entity": 0.
|
| 73 |
-
"radgraph_f1_relation": 0.
|
| 74 |
"radgraph_available": true,
|
| 75 |
"radgraph_error": null
|
| 76 |
},
|
|
@@ -80,19 +80,19 @@
|
|
| 80 |
"dataset": "mimic-cxr",
|
| 81 |
"view_filter": "frontal-only (PA/AP), structured Findings section only",
|
| 82 |
"num_examples": 2210,
|
| 83 |
-
"bleu_1": 0.
|
| 84 |
-
"bleu_4": 0.
|
| 85 |
-
"meteor": 0.
|
| 86 |
-
"rouge_l": 0.
|
| 87 |
-
"chexpert_f1_14_micro": 0.
|
| 88 |
-
"chexpert_f1_5_micro": 0.
|
| 89 |
-
"chexpert_f1_14_macro": 0.
|
| 90 |
-
"chexpert_f1_5_macro": 0.
|
| 91 |
-
"chexpert_f1_micro": 0.
|
| 92 |
-
"chexpert_f1_macro": 0.
|
| 93 |
"chexpert_per_label_f1": {
|
| 94 |
-
"Enlarged Cardiomediastinum": 0.
|
| 95 |
-
"Cardiomegaly": 0.
|
| 96 |
"Lung Opacity": 0.0,
|
| 97 |
"Lung Lesion": 0.0,
|
| 98 |
"Edema": 0.0,
|
|
@@ -103,12 +103,12 @@
|
|
| 103 |
"Pleural Effusion": 0.0,
|
| 104 |
"Pleural Other": 0.0,
|
| 105 |
"Fracture": 0.0,
|
| 106 |
-
"Support Devices": 0.
|
| 107 |
-
"No Finding": 0.
|
| 108 |
},
|
| 109 |
-
"radgraph_f1": 0.
|
| 110 |
-
"radgraph_f1_entity": 0.
|
| 111 |
-
"radgraph_f1_relation": 0.
|
| 112 |
"radgraph_available": true,
|
| 113 |
"radgraph_error": null
|
| 114 |
}
|
|
|
|
| 4 |
"dataset": "mimic-cxr",
|
| 5 |
"view_filter": "frontal-only (PA/AP)",
|
| 6 |
"num_examples": 3041,
|
| 7 |
+
"bleu_1": 0.22660013916205363,
|
| 8 |
+
"bleu_4": 0.044065714234343675,
|
| 9 |
+
"meteor": 0.20172974056236193,
|
| 10 |
+
"rouge_l": 0.16930790037433466,
|
| 11 |
+
"chexpert_f1_14_micro": 0.22114848541163354,
|
| 12 |
+
"chexpert_f1_5_micro": 0.05919661733615222,
|
| 13 |
+
"chexpert_f1_14_macro": 0.0760104965288089,
|
| 14 |
+
"chexpert_f1_5_macro": 0.035443037974683546,
|
| 15 |
+
"chexpert_f1_micro": 0.22114848541163354,
|
| 16 |
+
"chexpert_f1_macro": 0.0760104965288089,
|
| 17 |
"chexpert_per_label_f1": {
|
| 18 |
+
"Enlarged Cardiomediastinum": 0.0273972602739726,
|
| 19 |
+
"Cardiomegaly": 0.17721518987341772,
|
| 20 |
"Lung Opacity": 0.0,
|
| 21 |
"Lung Lesion": 0.0,
|
| 22 |
"Edema": 0.0,
|
|
|
|
| 27 |
"Pleural Effusion": 0.0,
|
| 28 |
"Pleural Other": 0.0,
|
| 29 |
"Fracture": 0.0,
|
| 30 |
+
"Support Devices": 0.5660377358490566,
|
| 31 |
+
"No Finding": 0.29349676540687775
|
| 32 |
},
|
| 33 |
+
"radgraph_f1": 0.0813727768314203,
|
| 34 |
+
"radgraph_f1_entity": 0.1452824713307004,
|
| 35 |
+
"radgraph_f1_relation": 0.13105287501866947,
|
| 36 |
"radgraph_available": true,
|
| 37 |
"radgraph_error": null,
|
| 38 |
"evaluation_suite": "mimic_test_dual",
|
|
|
|
| 42 |
"dataset": "mimic-cxr",
|
| 43 |
"view_filter": "frontal-only (PA/AP)",
|
| 44 |
"num_examples": 3041,
|
| 45 |
+
"bleu_1": 0.22660013916205363,
|
| 46 |
+
"bleu_4": 0.044065714234343675,
|
| 47 |
+
"meteor": 0.20172974056236193,
|
| 48 |
+
"rouge_l": 0.16930790037433466,
|
| 49 |
+
"chexpert_f1_14_micro": 0.22114848541163354,
|
| 50 |
+
"chexpert_f1_5_micro": 0.05919661733615222,
|
| 51 |
+
"chexpert_f1_14_macro": 0.0760104965288089,
|
| 52 |
+
"chexpert_f1_5_macro": 0.035443037974683546,
|
| 53 |
+
"chexpert_f1_micro": 0.22114848541163354,
|
| 54 |
+
"chexpert_f1_macro": 0.0760104965288089,
|
| 55 |
"chexpert_per_label_f1": {
|
| 56 |
+
"Enlarged Cardiomediastinum": 0.0273972602739726,
|
| 57 |
+
"Cardiomegaly": 0.17721518987341772,
|
| 58 |
"Lung Opacity": 0.0,
|
| 59 |
"Lung Lesion": 0.0,
|
| 60 |
"Edema": 0.0,
|
|
|
|
| 65 |
"Pleural Effusion": 0.0,
|
| 66 |
"Pleural Other": 0.0,
|
| 67 |
"Fracture": 0.0,
|
| 68 |
+
"Support Devices": 0.5660377358490566,
|
| 69 |
+
"No Finding": 0.29349676540687775
|
| 70 |
},
|
| 71 |
+
"radgraph_f1": 0.0813727768314203,
|
| 72 |
+
"radgraph_f1_entity": 0.1452824713307004,
|
| 73 |
+
"radgraph_f1_relation": 0.13105287501866947,
|
| 74 |
"radgraph_available": true,
|
| 75 |
"radgraph_error": null
|
| 76 |
},
|
|
|
|
| 80 |
"dataset": "mimic-cxr",
|
| 81 |
"view_filter": "frontal-only (PA/AP), structured Findings section only",
|
| 82 |
"num_examples": 2210,
|
| 83 |
+
"bleu_1": 0.2365366145096764,
|
| 84 |
+
"bleu_4": 0.05092983875760064,
|
| 85 |
+
"meteor": 0.21583177087647024,
|
| 86 |
+
"rouge_l": 0.17757622652257565,
|
| 87 |
+
"chexpert_f1_14_micro": 0.22206742825299527,
|
| 88 |
+
"chexpert_f1_5_micro": 0.06491372226787182,
|
| 89 |
+
"chexpert_f1_14_macro": 0.07578923935276514,
|
| 90 |
+
"chexpert_f1_5_macro": 0.03816425120772947,
|
| 91 |
+
"chexpert_f1_micro": 0.22206742825299527,
|
| 92 |
+
"chexpert_f1_macro": 0.07578923935276514,
|
| 93 |
"chexpert_per_label_f1": {
|
| 94 |
+
"Enlarged Cardiomediastinum": 0.028169014084507043,
|
| 95 |
+
"Cardiomegaly": 0.19082125603864733,
|
| 96 |
"Lung Opacity": 0.0,
|
| 97 |
"Lung Lesion": 0.0,
|
| 98 |
"Edema": 0.0,
|
|
|
|
| 103 |
"Pleural Effusion": 0.0,
|
| 104 |
"Pleural Other": 0.0,
|
| 105 |
"Fracture": 0.0,
|
| 106 |
+
"Support Devices": 0.4929681717246484,
|
| 107 |
+
"No Finding": 0.3490909090909091
|
| 108 |
},
|
| 109 |
+
"radgraph_f1": 0.09257418507561303,
|
| 110 |
+
"radgraph_f1_entity": 0.15825019075973226,
|
| 111 |
+
"radgraph_f1_relation": 0.1427606782830153,
|
| 112 |
"radgraph_available": true,
|
| 113 |
"radgraph_error": null
|
| 114 |
}
|
evaluations/mimic_test_predictions.csv
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
run_summary.json
CHANGED
|
@@ -44,5 +44,122 @@
|
|
| 44 |
"target_duration_mode": "per_invocation",
|
| 45 |
"repo_id": "manu02/LAnA-v4",
|
| 46 |
"train_datasets": "MIMIC-CXR (findings-only)",
|
| 47 |
-
"validation_datasets": "MIMIC-CXR (findings-only)"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
}
|
|
|
|
| 44 |
"target_duration_mode": "per_invocation",
|
| 45 |
"repo_id": "manu02/LAnA-v4",
|
| 46 |
"train_datasets": "MIMIC-CXR (findings-only)",
|
| 47 |
+
"validation_datasets": "MIMIC-CXR (findings-only)",
|
| 48 |
+
"repo_url": "https://huggingface.co/manu02/LAnA-v4",
|
| 49 |
+
"latest_evaluation": {
|
| 50 |
+
"split": "test",
|
| 51 |
+
"subset": "all frontal studies",
|
| 52 |
+
"dataset": "mimic-cxr",
|
| 53 |
+
"view_filter": "frontal-only (PA/AP)",
|
| 54 |
+
"num_examples": 3041,
|
| 55 |
+
"bleu_1": 0.22660013916205363,
|
| 56 |
+
"bleu_4": 0.044065714234343675,
|
| 57 |
+
"meteor": 0.20172974056236193,
|
| 58 |
+
"rouge_l": 0.16930790037433466,
|
| 59 |
+
"chexpert_f1_14_micro": 0.22114848541163354,
|
| 60 |
+
"chexpert_f1_5_micro": 0.05919661733615222,
|
| 61 |
+
"chexpert_f1_14_macro": 0.0760104965288089,
|
| 62 |
+
"chexpert_f1_5_macro": 0.035443037974683546,
|
| 63 |
+
"chexpert_f1_micro": 0.22114848541163354,
|
| 64 |
+
"chexpert_f1_macro": 0.0760104965288089,
|
| 65 |
+
"chexpert_per_label_f1": {
|
| 66 |
+
"Enlarged Cardiomediastinum": 0.0273972602739726,
|
| 67 |
+
"Cardiomegaly": 0.17721518987341772,
|
| 68 |
+
"Lung Opacity": 0.0,
|
| 69 |
+
"Lung Lesion": 0.0,
|
| 70 |
+
"Edema": 0.0,
|
| 71 |
+
"Consolidation": 0.0,
|
| 72 |
+
"Pneumonia": 0.0,
|
| 73 |
+
"Atelectasis": 0.0,
|
| 74 |
+
"Pneumothorax": 0.0,
|
| 75 |
+
"Pleural Effusion": 0.0,
|
| 76 |
+
"Pleural Other": 0.0,
|
| 77 |
+
"Fracture": 0.0,
|
| 78 |
+
"Support Devices": 0.5660377358490566,
|
| 79 |
+
"No Finding": 0.29349676540687775
|
| 80 |
+
},
|
| 81 |
+
"radgraph_f1": 0.0813727768314203,
|
| 82 |
+
"radgraph_f1_entity": 0.1452824713307004,
|
| 83 |
+
"radgraph_f1_relation": 0.13105287501866947,
|
| 84 |
+
"radgraph_available": true,
|
| 85 |
+
"radgraph_error": null
|
| 86 |
+
},
|
| 87 |
+
"latest_evaluations": {
|
| 88 |
+
"all_test": {
|
| 89 |
+
"split": "test",
|
| 90 |
+
"subset": "all frontal studies",
|
| 91 |
+
"dataset": "mimic-cxr",
|
| 92 |
+
"view_filter": "frontal-only (PA/AP)",
|
| 93 |
+
"num_examples": 3041,
|
| 94 |
+
"bleu_1": 0.22660013916205363,
|
| 95 |
+
"bleu_4": 0.044065714234343675,
|
| 96 |
+
"meteor": 0.20172974056236193,
|
| 97 |
+
"rouge_l": 0.16930790037433466,
|
| 98 |
+
"chexpert_f1_14_micro": 0.22114848541163354,
|
| 99 |
+
"chexpert_f1_5_micro": 0.05919661733615222,
|
| 100 |
+
"chexpert_f1_14_macro": 0.0760104965288089,
|
| 101 |
+
"chexpert_f1_5_macro": 0.035443037974683546,
|
| 102 |
+
"chexpert_f1_micro": 0.22114848541163354,
|
| 103 |
+
"chexpert_f1_macro": 0.0760104965288089,
|
| 104 |
+
"chexpert_per_label_f1": {
|
| 105 |
+
"Enlarged Cardiomediastinum": 0.0273972602739726,
|
| 106 |
+
"Cardiomegaly": 0.17721518987341772,
|
| 107 |
+
"Lung Opacity": 0.0,
|
| 108 |
+
"Lung Lesion": 0.0,
|
| 109 |
+
"Edema": 0.0,
|
| 110 |
+
"Consolidation": 0.0,
|
| 111 |
+
"Pneumonia": 0.0,
|
| 112 |
+
"Atelectasis": 0.0,
|
| 113 |
+
"Pneumothorax": 0.0,
|
| 114 |
+
"Pleural Effusion": 0.0,
|
| 115 |
+
"Pleural Other": 0.0,
|
| 116 |
+
"Fracture": 0.0,
|
| 117 |
+
"Support Devices": 0.5660377358490566,
|
| 118 |
+
"No Finding": 0.29349676540687775
|
| 119 |
+
},
|
| 120 |
+
"radgraph_f1": 0.0813727768314203,
|
| 121 |
+
"radgraph_f1_entity": 0.1452824713307004,
|
| 122 |
+
"radgraph_f1_relation": 0.13105287501866947,
|
| 123 |
+
"radgraph_available": true,
|
| 124 |
+
"radgraph_error": null
|
| 125 |
+
},
|
| 126 |
+
"findings_only_test": {
|
| 127 |
+
"split": "test",
|
| 128 |
+
"subset": "findings-only frontal studies",
|
| 129 |
+
"dataset": "mimic-cxr",
|
| 130 |
+
"view_filter": "frontal-only (PA/AP), structured Findings section only",
|
| 131 |
+
"num_examples": 2210,
|
| 132 |
+
"bleu_1": 0.2365366145096764,
|
| 133 |
+
"bleu_4": 0.05092983875760064,
|
| 134 |
+
"meteor": 0.21583177087647024,
|
| 135 |
+
"rouge_l": 0.17757622652257565,
|
| 136 |
+
"chexpert_f1_14_micro": 0.22206742825299527,
|
| 137 |
+
"chexpert_f1_5_micro": 0.06491372226787182,
|
| 138 |
+
"chexpert_f1_14_macro": 0.07578923935276514,
|
| 139 |
+
"chexpert_f1_5_macro": 0.03816425120772947,
|
| 140 |
+
"chexpert_f1_micro": 0.22206742825299527,
|
| 141 |
+
"chexpert_f1_macro": 0.07578923935276514,
|
| 142 |
+
"chexpert_per_label_f1": {
|
| 143 |
+
"Enlarged Cardiomediastinum": 0.028169014084507043,
|
| 144 |
+
"Cardiomegaly": 0.19082125603864733,
|
| 145 |
+
"Lung Opacity": 0.0,
|
| 146 |
+
"Lung Lesion": 0.0,
|
| 147 |
+
"Edema": 0.0,
|
| 148 |
+
"Consolidation": 0.0,
|
| 149 |
+
"Pneumonia": 0.0,
|
| 150 |
+
"Atelectasis": 0.0,
|
| 151 |
+
"Pneumothorax": 0.0,
|
| 152 |
+
"Pleural Effusion": 0.0,
|
| 153 |
+
"Pleural Other": 0.0,
|
| 154 |
+
"Fracture": 0.0,
|
| 155 |
+
"Support Devices": 0.4929681717246484,
|
| 156 |
+
"No Finding": 0.3490909090909091
|
| 157 |
+
},
|
| 158 |
+
"radgraph_f1": 0.09257418507561303,
|
| 159 |
+
"radgraph_f1_entity": 0.15825019075973226,
|
| 160 |
+
"radgraph_f1_relation": 0.1427606782830153,
|
| 161 |
+
"radgraph_available": true,
|
| 162 |
+
"radgraph_error": null
|
| 163 |
+
}
|
| 164 |
+
}
|
| 165 |
}
|