manu02 commited on
Commit
e090d0d
·
verified ·
1 Parent(s): de3c1ea

Upload MIMIC test evaluation results

Browse files
README.md CHANGED
@@ -98,34 +98,43 @@ print(report)
98
 
99
  Frontal-only evaluation using `PA/AP` studies only.
100
 
101
- ### Current Checkpoint Results
102
-
103
- | Metric | Value |
104
- | --- | --- |
105
- | Number of studies | TBD |
106
- | RadGraph F1 | TBD |
107
- | RadGraph entity F1 | TBD |
108
- | RadGraph relation F1 | TBD |
109
- | CheXpert F1 14-micro | TBD |
110
- | CheXpert F1 5-micro | TBD |
111
- | CheXpert F1 14-macro | TBD |
112
- | CheXpert F1 5-macro | TBD |
113
-
114
- ### Final Completed Training Results
115
-
116
- The final table will be populated when the planned training run is completed. Until then, final-report metrics remain `TBD`.
117
-
118
- | Metric | Value |
119
- | --- | --- |
120
- | Number of studies | TBD |
121
- | RadGraph F1 | TBD |
122
- | RadGraph entity F1 | TBD |
123
- | RadGraph relation F1 | TBD |
124
- | CheXpert F1 14-micro | TBD |
125
- | CheXpert F1 5-micro | TBD |
126
- | CheXpert F1 14-macro | TBD |
127
- | CheXpert F1 5-macro | TBD |
128
-
 
 
 
 
 
 
 
 
 
129
 
130
  ## Data
131
 
@@ -138,6 +147,15 @@ The final table will be populated when the planned training run is completed. Un
138
 
139
  - Medical report metrics implemented in the repository include RadGraph F1 and CheXpert F1 (`14-micro`, `5-micro`, `14-macro`, `5-macro`).
140
 
 
 
 
 
 
 
 
 
 
141
  ## Training Snapshot
142
 
143
  - Run: `LAnA-v4`
@@ -173,4 +191,4 @@ The final table will be populated when the planned training run is completed. Un
173
 
174
  - Set `HF_TOKEN` with permission to access the DINOv3 repositories required by this model before downloading or running inference.
175
  - `segmenters/` contains the lung and heart segmentation checkpoints used to build anatomical attention masks.
176
- - `evaluations/mimic_test_metrics.json` contains the latest saved MIMIC test metrics.
 
98
 
99
  Frontal-only evaluation using `PA/AP` studies only.
100
 
101
+ These comparison tables are refreshed across the full LAnA collection whenever any collection model is evaluated.
102
+
103
+ ### Cross-Model Comparison: All Frontal Test Studies
104
+
105
+ | Metric | LAnA-MIMIC-CHEXPERT | LAnA-MIMIC | LAnA | LAnA-v2 | LAnA-v3 | LAnA-v4 (Model still training) |
106
+ | --- | --- | --- | --- | --- | --- | --- |
107
+ | Run status | `Completed` | `Completed` | `Completed` | `Completed` | `Completed` | `Model still training` |
108
+ | Number of studies | `3041` | `3041` | `3041` | `3041` | `3041` | `3041` |
109
+ | ROUGE-L | `0.1513` | `0.1653` | `0.1686` | `0.1670` | `0.1745` | `0.1693` |
110
+ | BLEU-1 | `0.1707` | `0.1916` | `0.2091` | `0.2174` | `0.2346` | `0.2266` |
111
+ | BLEU-4 | `0.0357` | `0.0386` | `0.0417` | `0.0417` | `0.0484` | `0.0441` |
112
+ | METEOR | `0.2079` | `0.2202` | `0.2298` | `0.2063` | `0.2129` | `0.2017` |
113
+ | RadGraph F1 | `0.0918` | `0.0921` | `0.1024` | `0.1057` | `0.0939` | `0.0814` |
114
+ | RadGraph entity F1 | `0.1399` | `0.1459` | `0.1587` | `0.1569` | `0.1441` | `0.1453` |
115
+ | RadGraph relation F1 | `0.1246` | `0.1322` | `0.1443` | `0.1474` | `0.1280` | `0.1311` |
116
+ | CheXpert F1 14-micro | `0.1829` | `0.1565` | `0.2116` | `0.1401` | `0.3116` | `0.2211` |
117
+ | CheXpert F1 5-micro | `0.2183` | `0.1530` | `0.2512` | `0.2506` | `0.2486` | `0.0592` |
118
+ | CheXpert F1 14-macro | `0.1095` | `0.0713` | `0.1095` | `0.0401` | `0.1363` | `0.0760` |
119
+ | CheXpert F1 5-macro | `0.1634` | `0.1007` | `0.1644` | `0.1004` | `0.1686` | `0.0354` |
120
+
121
+ ### Cross-Model Comparison: Findings-Only Frontal Test Studies
122
+
123
+ | Metric | LAnA-MIMIC-CHEXPERT | LAnA-MIMIC | LAnA | LAnA-v2 | LAnA-v3 | LAnA-v4 (Model still training) |
124
+ | --- | --- | --- | --- | --- | --- | --- |
125
+ | Run status | `Completed` | `Completed` | `Completed` | `Completed` | `Completed` | `Model still training` |
126
+ | Number of studies | `2210` | `2210` | `2210` | `2210` | `2210` | `2210` |
127
+ | ROUGE-L | `0.1576` | `0.1720` | `0.1771` | `0.1771` | `0.1848` | `0.1776` |
128
+ | BLEU-1 | `0.1754` | `0.2003` | `0.2177` | `0.2263` | `0.2480` | `0.2365` |
129
+ | BLEU-4 | `0.0405` | `0.0449` | `0.0484` | `0.0487` | `0.0573` | `0.0509` |
130
+ | METEOR | `0.2207` | `0.2347` | `0.2466` | `0.2240` | `0.2310` | `0.2158` |
131
+ | RadGraph F1 | `0.1010` | `0.1000` | `0.1119` | `0.1181` | `0.1046` | `0.0926` |
132
+ | RadGraph entity F1 | `0.1517` | `0.1577` | `0.1713` | `0.1739` | `0.1584` | `0.1583` |
133
+ | RadGraph relation F1 | `0.1347` | `0.1413` | `0.1549` | `0.1628` | `0.1405` | `0.1428` |
134
+ | CheXpert F1 14-micro | `0.1651` | `0.1442` | `0.1907` | `0.1365` | `0.2921` | `0.2221` |
135
+ | CheXpert F1 5-micro | `0.2152` | `0.1716` | `0.2415` | `0.2455` | `0.2394` | `0.0649` |
136
+ | CheXpert F1 14-macro | `0.1047` | `0.0700` | `0.1039` | `0.0381` | `0.1326` | `0.0758` |
137
+ | CheXpert F1 5-macro | `0.1611` | `0.1112` | `0.1578` | `0.0952` | `0.1636` | `0.0382` |
138
 
139
  ## Data
140
 
 
147
 
148
  - Medical report metrics implemented in the repository include RadGraph F1 and CheXpert F1 (`14-micro`, `5-micro`, `14-macro`, `5-macro`).
149
 
150
+ ## Experiment Model Descriptions
151
+
152
+ - `LAnA-MIMIC-CHEXPERT`: This variant was trained on a combined dataset of `CheXpert` and `MIMIC-CXR` using LoRA fine-tuning with the `AdamW` optimizer.
153
+ - `LAnA-MIMIC`: This model was trained on the `MIMIC-CXR (findings-only)` dataset using LoRA fine-tuning with the `AdamW` optimizer.
154
+ - `LAnA`: This model was trained on the `MIMIC-CXR (findings-only)` dataset using full-model optimization with `AdamW` instead of LoRA.
155
+ - `LAnA-v2`: This version keeps the same training setup as `LAnA`, but increases the effective global batch size from `16` to `128`.
156
+ - `LAnA-v3`: This version keeps the same training setup as `LAnA`, including the effective global batch size of `16`, but changes how EOS is handled so training and generation follow the same behavior. The model no longer uses the EOS token during training, and generation remained greedy without stopping when an EOS token was produced. In the previous setup, decoding was also greedy, stopped at EOS, and used a maximum of `128` new tokens.
157
+ - `LAnA-v4`: This version keeps the same decoding behavior as `LAnA-v3`, but increases the effective global batch size from `16` to `128`.
158
+
159
  ## Training Snapshot
160
 
161
  - Run: `LAnA-v4`
 
191
 
192
  - Set `HF_TOKEN` with permission to access the DINOv3 repositories required by this model before downloading or running inference.
193
  - `segmenters/` contains the lung and heart segmentation checkpoints used to build anatomical attention masks.
194
+ - `evaluations/mimic_test_metrics.json` contains the latest saved MIMIC test metrics.
evaluations/mimic_test_findings_only_metrics.json CHANGED
@@ -4,19 +4,19 @@
4
  "dataset": "mimic-cxr",
5
  "view_filter": "frontal-only (PA/AP), structured Findings section only",
6
  "num_examples": 2210,
7
- "bleu_1": 0.23179623497255453,
8
- "bleu_4": 0.04902797277049711,
9
- "meteor": 0.2125228998859286,
10
- "rouge_l": 0.17742184702691816,
11
- "chexpert_f1_14_micro": 0.21896792189679218,
12
- "chexpert_f1_5_micro": 0.05121638924455826,
13
- "chexpert_f1_14_macro": 0.07088977497872763,
14
- "chexpert_f1_5_macro": 0.03252032520325203,
15
- "chexpert_f1_micro": 0.21896792189679218,
16
- "chexpert_f1_macro": 0.07088977497872763,
17
  "chexpert_per_label_f1": {
18
- "Enlarged Cardiomediastinum": 0.0,
19
- "Cardiomegaly": 0.16260162601626016,
20
  "Lung Opacity": 0.0,
21
  "Lung Lesion": 0.0,
22
  "Edema": 0.0,
@@ -27,12 +27,12 @@
27
  "Pleural Effusion": 0.0,
28
  "Pleural Other": 0.0,
29
  "Fracture": 0.0,
30
- "Support Devices": 0.48493543758967,
31
- "No Finding": 0.3449197860962567
32
  },
33
- "radgraph_f1": 0.088204837339789,
34
- "radgraph_f1_entity": 0.15309947777272206,
35
- "radgraph_f1_relation": 0.1377130122662026,
36
  "radgraph_available": true,
37
  "radgraph_error": null
38
  }
 
4
  "dataset": "mimic-cxr",
5
  "view_filter": "frontal-only (PA/AP), structured Findings section only",
6
  "num_examples": 2210,
7
+ "bleu_1": 0.2365366145096764,
8
+ "bleu_4": 0.05092983875760064,
9
+ "meteor": 0.21583177087647024,
10
+ "rouge_l": 0.17757622652257565,
11
+ "chexpert_f1_14_micro": 0.22206742825299527,
12
+ "chexpert_f1_5_micro": 0.06491372226787182,
13
+ "chexpert_f1_14_macro": 0.07578923935276514,
14
+ "chexpert_f1_5_macro": 0.03816425120772947,
15
+ "chexpert_f1_micro": 0.22206742825299527,
16
+ "chexpert_f1_macro": 0.07578923935276514,
17
  "chexpert_per_label_f1": {
18
+ "Enlarged Cardiomediastinum": 0.028169014084507043,
19
+ "Cardiomegaly": 0.19082125603864733,
20
  "Lung Opacity": 0.0,
21
  "Lung Lesion": 0.0,
22
  "Edema": 0.0,
 
27
  "Pleural Effusion": 0.0,
28
  "Pleural Other": 0.0,
29
  "Fracture": 0.0,
30
+ "Support Devices": 0.4929681717246484,
31
+ "No Finding": 0.3490909090909091
32
  },
33
+ "radgraph_f1": 0.09257418507561303,
34
+ "radgraph_f1_entity": 0.15825019075973226,
35
+ "radgraph_f1_relation": 0.1427606782830153,
36
  "radgraph_available": true,
37
  "radgraph_error": null
38
  }
evaluations/mimic_test_findings_only_predictions.csv CHANGED
The diff for this file is too large to render. See raw diff
 
evaluations/mimic_test_metrics.json CHANGED
@@ -4,19 +4,19 @@
4
  "dataset": "mimic-cxr",
5
  "view_filter": "frontal-only (PA/AP)",
6
  "num_examples": 3041,
7
- "bleu_1": 0.2227295189394753,
8
- "bleu_4": 0.042573846171241554,
9
- "meteor": 0.19888441013163247,
10
- "rouge_l": 0.16924689298814102,
11
- "chexpert_f1_14_micro": 0.21826654240447343,
12
- "chexpert_f1_5_micro": 0.047995636760294516,
13
- "chexpert_f1_14_macro": 0.07155355415264575,
14
- "chexpert_f1_5_macro": 0.030688753269398433,
15
- "chexpert_f1_micro": 0.21826654240447343,
16
- "chexpert_f1_macro": 0.07155355415264575,
17
  "chexpert_per_label_f1": {
18
- "Enlarged Cardiomediastinum": 0.0,
19
- "Cardiomegaly": 0.15344376634699217,
20
  "Lung Opacity": 0.0,
21
  "Lung Lesion": 0.0,
22
  "Edema": 0.0,
@@ -27,12 +27,12 @@
27
  "Pleural Effusion": 0.0,
28
  "Pleural Other": 0.0,
29
  "Fracture": 0.0,
30
- "Support Devices": 0.5560807907176623,
31
- "No Finding": 0.29222520107238603
32
  },
33
- "radgraph_f1": 0.07782635321975587,
34
- "radgraph_f1_entity": 0.1406092548436011,
35
- "radgraph_f1_relation": 0.1263392751416026,
36
  "radgraph_available": true,
37
  "radgraph_error": null,
38
  "evaluation_suite": "mimic_test_dual",
@@ -42,19 +42,19 @@
42
  "dataset": "mimic-cxr",
43
  "view_filter": "frontal-only (PA/AP)",
44
  "num_examples": 3041,
45
- "bleu_1": 0.2227295189394753,
46
- "bleu_4": 0.042573846171241554,
47
- "meteor": 0.19888441013163247,
48
- "rouge_l": 0.16924689298814102,
49
- "chexpert_f1_14_micro": 0.21826654240447343,
50
- "chexpert_f1_5_micro": 0.047995636760294516,
51
- "chexpert_f1_14_macro": 0.07155355415264575,
52
- "chexpert_f1_5_macro": 0.030688753269398433,
53
- "chexpert_f1_micro": 0.21826654240447343,
54
- "chexpert_f1_macro": 0.07155355415264575,
55
  "chexpert_per_label_f1": {
56
- "Enlarged Cardiomediastinum": 0.0,
57
- "Cardiomegaly": 0.15344376634699217,
58
  "Lung Opacity": 0.0,
59
  "Lung Lesion": 0.0,
60
  "Edema": 0.0,
@@ -65,12 +65,12 @@
65
  "Pleural Effusion": 0.0,
66
  "Pleural Other": 0.0,
67
  "Fracture": 0.0,
68
- "Support Devices": 0.5560807907176623,
69
- "No Finding": 0.29222520107238603
70
  },
71
- "radgraph_f1": 0.07782635321975587,
72
- "radgraph_f1_entity": 0.1406092548436011,
73
- "radgraph_f1_relation": 0.1263392751416026,
74
  "radgraph_available": true,
75
  "radgraph_error": null
76
  },
@@ -80,19 +80,19 @@
80
  "dataset": "mimic-cxr",
81
  "view_filter": "frontal-only (PA/AP), structured Findings section only",
82
  "num_examples": 2210,
83
- "bleu_1": 0.23179623497255453,
84
- "bleu_4": 0.04902797277049711,
85
- "meteor": 0.2125228998859286,
86
- "rouge_l": 0.17742184702691816,
87
- "chexpert_f1_14_micro": 0.21896792189679218,
88
- "chexpert_f1_5_micro": 0.05121638924455826,
89
- "chexpert_f1_14_macro": 0.07088977497872763,
90
- "chexpert_f1_5_macro": 0.03252032520325203,
91
- "chexpert_f1_micro": 0.21896792189679218,
92
- "chexpert_f1_macro": 0.07088977497872763,
93
  "chexpert_per_label_f1": {
94
- "Enlarged Cardiomediastinum": 0.0,
95
- "Cardiomegaly": 0.16260162601626016,
96
  "Lung Opacity": 0.0,
97
  "Lung Lesion": 0.0,
98
  "Edema": 0.0,
@@ -103,12 +103,12 @@
103
  "Pleural Effusion": 0.0,
104
  "Pleural Other": 0.0,
105
  "Fracture": 0.0,
106
- "Support Devices": 0.48493543758967,
107
- "No Finding": 0.3449197860962567
108
  },
109
- "radgraph_f1": 0.088204837339789,
110
- "radgraph_f1_entity": 0.15309947777272206,
111
- "radgraph_f1_relation": 0.1377130122662026,
112
  "radgraph_available": true,
113
  "radgraph_error": null
114
  }
 
4
  "dataset": "mimic-cxr",
5
  "view_filter": "frontal-only (PA/AP)",
6
  "num_examples": 3041,
7
+ "bleu_1": 0.22660013916205363,
8
+ "bleu_4": 0.044065714234343675,
9
+ "meteor": 0.20172974056236193,
10
+ "rouge_l": 0.16930790037433466,
11
+ "chexpert_f1_14_micro": 0.22114848541163354,
12
+ "chexpert_f1_5_micro": 0.05919661733615222,
13
+ "chexpert_f1_14_macro": 0.0760104965288089,
14
+ "chexpert_f1_5_macro": 0.035443037974683546,
15
+ "chexpert_f1_micro": 0.22114848541163354,
16
+ "chexpert_f1_macro": 0.0760104965288089,
17
  "chexpert_per_label_f1": {
18
+ "Enlarged Cardiomediastinum": 0.0273972602739726,
19
+ "Cardiomegaly": 0.17721518987341772,
20
  "Lung Opacity": 0.0,
21
  "Lung Lesion": 0.0,
22
  "Edema": 0.0,
 
27
  "Pleural Effusion": 0.0,
28
  "Pleural Other": 0.0,
29
  "Fracture": 0.0,
30
+ "Support Devices": 0.5660377358490566,
31
+ "No Finding": 0.29349676540687775
32
  },
33
+ "radgraph_f1": 0.0813727768314203,
34
+ "radgraph_f1_entity": 0.1452824713307004,
35
+ "radgraph_f1_relation": 0.13105287501866947,
36
  "radgraph_available": true,
37
  "radgraph_error": null,
38
  "evaluation_suite": "mimic_test_dual",
 
42
  "dataset": "mimic-cxr",
43
  "view_filter": "frontal-only (PA/AP)",
44
  "num_examples": 3041,
45
+ "bleu_1": 0.22660013916205363,
46
+ "bleu_4": 0.044065714234343675,
47
+ "meteor": 0.20172974056236193,
48
+ "rouge_l": 0.16930790037433466,
49
+ "chexpert_f1_14_micro": 0.22114848541163354,
50
+ "chexpert_f1_5_micro": 0.05919661733615222,
51
+ "chexpert_f1_14_macro": 0.0760104965288089,
52
+ "chexpert_f1_5_macro": 0.035443037974683546,
53
+ "chexpert_f1_micro": 0.22114848541163354,
54
+ "chexpert_f1_macro": 0.0760104965288089,
55
  "chexpert_per_label_f1": {
56
+ "Enlarged Cardiomediastinum": 0.0273972602739726,
57
+ "Cardiomegaly": 0.17721518987341772,
58
  "Lung Opacity": 0.0,
59
  "Lung Lesion": 0.0,
60
  "Edema": 0.0,
 
65
  "Pleural Effusion": 0.0,
66
  "Pleural Other": 0.0,
67
  "Fracture": 0.0,
68
+ "Support Devices": 0.5660377358490566,
69
+ "No Finding": 0.29349676540687775
70
  },
71
+ "radgraph_f1": 0.0813727768314203,
72
+ "radgraph_f1_entity": 0.1452824713307004,
73
+ "radgraph_f1_relation": 0.13105287501866947,
74
  "radgraph_available": true,
75
  "radgraph_error": null
76
  },
 
80
  "dataset": "mimic-cxr",
81
  "view_filter": "frontal-only (PA/AP), structured Findings section only",
82
  "num_examples": 2210,
83
+ "bleu_1": 0.2365366145096764,
84
+ "bleu_4": 0.05092983875760064,
85
+ "meteor": 0.21583177087647024,
86
+ "rouge_l": 0.17757622652257565,
87
+ "chexpert_f1_14_micro": 0.22206742825299527,
88
+ "chexpert_f1_5_micro": 0.06491372226787182,
89
+ "chexpert_f1_14_macro": 0.07578923935276514,
90
+ "chexpert_f1_5_macro": 0.03816425120772947,
91
+ "chexpert_f1_micro": 0.22206742825299527,
92
+ "chexpert_f1_macro": 0.07578923935276514,
93
  "chexpert_per_label_f1": {
94
+ "Enlarged Cardiomediastinum": 0.028169014084507043,
95
+ "Cardiomegaly": 0.19082125603864733,
96
  "Lung Opacity": 0.0,
97
  "Lung Lesion": 0.0,
98
  "Edema": 0.0,
 
103
  "Pleural Effusion": 0.0,
104
  "Pleural Other": 0.0,
105
  "Fracture": 0.0,
106
+ "Support Devices": 0.4929681717246484,
107
+ "No Finding": 0.3490909090909091
108
  },
109
+ "radgraph_f1": 0.09257418507561303,
110
+ "radgraph_f1_entity": 0.15825019075973226,
111
+ "radgraph_f1_relation": 0.1427606782830153,
112
  "radgraph_available": true,
113
  "radgraph_error": null
114
  }
evaluations/mimic_test_predictions.csv CHANGED
The diff for this file is too large to render. See raw diff
 
run_summary.json CHANGED
@@ -44,5 +44,122 @@
44
  "target_duration_mode": "per_invocation",
45
  "repo_id": "manu02/LAnA-v4",
46
  "train_datasets": "MIMIC-CXR (findings-only)",
47
- "validation_datasets": "MIMIC-CXR (findings-only)"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  }
 
44
  "target_duration_mode": "per_invocation",
45
  "repo_id": "manu02/LAnA-v4",
46
  "train_datasets": "MIMIC-CXR (findings-only)",
47
+ "validation_datasets": "MIMIC-CXR (findings-only)",
48
+ "repo_url": "https://huggingface.co/manu02/LAnA-v4",
49
+ "latest_evaluation": {
50
+ "split": "test",
51
+ "subset": "all frontal studies",
52
+ "dataset": "mimic-cxr",
53
+ "view_filter": "frontal-only (PA/AP)",
54
+ "num_examples": 3041,
55
+ "bleu_1": 0.22660013916205363,
56
+ "bleu_4": 0.044065714234343675,
57
+ "meteor": 0.20172974056236193,
58
+ "rouge_l": 0.16930790037433466,
59
+ "chexpert_f1_14_micro": 0.22114848541163354,
60
+ "chexpert_f1_5_micro": 0.05919661733615222,
61
+ "chexpert_f1_14_macro": 0.0760104965288089,
62
+ "chexpert_f1_5_macro": 0.035443037974683546,
63
+ "chexpert_f1_micro": 0.22114848541163354,
64
+ "chexpert_f1_macro": 0.0760104965288089,
65
+ "chexpert_per_label_f1": {
66
+ "Enlarged Cardiomediastinum": 0.0273972602739726,
67
+ "Cardiomegaly": 0.17721518987341772,
68
+ "Lung Opacity": 0.0,
69
+ "Lung Lesion": 0.0,
70
+ "Edema": 0.0,
71
+ "Consolidation": 0.0,
72
+ "Pneumonia": 0.0,
73
+ "Atelectasis": 0.0,
74
+ "Pneumothorax": 0.0,
75
+ "Pleural Effusion": 0.0,
76
+ "Pleural Other": 0.0,
77
+ "Fracture": 0.0,
78
+ "Support Devices": 0.5660377358490566,
79
+ "No Finding": 0.29349676540687775
80
+ },
81
+ "radgraph_f1": 0.0813727768314203,
82
+ "radgraph_f1_entity": 0.1452824713307004,
83
+ "radgraph_f1_relation": 0.13105287501866947,
84
+ "radgraph_available": true,
85
+ "radgraph_error": null
86
+ },
87
+ "latest_evaluations": {
88
+ "all_test": {
89
+ "split": "test",
90
+ "subset": "all frontal studies",
91
+ "dataset": "mimic-cxr",
92
+ "view_filter": "frontal-only (PA/AP)",
93
+ "num_examples": 3041,
94
+ "bleu_1": 0.22660013916205363,
95
+ "bleu_4": 0.044065714234343675,
96
+ "meteor": 0.20172974056236193,
97
+ "rouge_l": 0.16930790037433466,
98
+ "chexpert_f1_14_micro": 0.22114848541163354,
99
+ "chexpert_f1_5_micro": 0.05919661733615222,
100
+ "chexpert_f1_14_macro": 0.0760104965288089,
101
+ "chexpert_f1_5_macro": 0.035443037974683546,
102
+ "chexpert_f1_micro": 0.22114848541163354,
103
+ "chexpert_f1_macro": 0.0760104965288089,
104
+ "chexpert_per_label_f1": {
105
+ "Enlarged Cardiomediastinum": 0.0273972602739726,
106
+ "Cardiomegaly": 0.17721518987341772,
107
+ "Lung Opacity": 0.0,
108
+ "Lung Lesion": 0.0,
109
+ "Edema": 0.0,
110
+ "Consolidation": 0.0,
111
+ "Pneumonia": 0.0,
112
+ "Atelectasis": 0.0,
113
+ "Pneumothorax": 0.0,
114
+ "Pleural Effusion": 0.0,
115
+ "Pleural Other": 0.0,
116
+ "Fracture": 0.0,
117
+ "Support Devices": 0.5660377358490566,
118
+ "No Finding": 0.29349676540687775
119
+ },
120
+ "radgraph_f1": 0.0813727768314203,
121
+ "radgraph_f1_entity": 0.1452824713307004,
122
+ "radgraph_f1_relation": 0.13105287501866947,
123
+ "radgraph_available": true,
124
+ "radgraph_error": null
125
+ },
126
+ "findings_only_test": {
127
+ "split": "test",
128
+ "subset": "findings-only frontal studies",
129
+ "dataset": "mimic-cxr",
130
+ "view_filter": "frontal-only (PA/AP), structured Findings section only",
131
+ "num_examples": 2210,
132
+ "bleu_1": 0.2365366145096764,
133
+ "bleu_4": 0.05092983875760064,
134
+ "meteor": 0.21583177087647024,
135
+ "rouge_l": 0.17757622652257565,
136
+ "chexpert_f1_14_micro": 0.22206742825299527,
137
+ "chexpert_f1_5_micro": 0.06491372226787182,
138
+ "chexpert_f1_14_macro": 0.07578923935276514,
139
+ "chexpert_f1_5_macro": 0.03816425120772947,
140
+ "chexpert_f1_micro": 0.22206742825299527,
141
+ "chexpert_f1_macro": 0.07578923935276514,
142
+ "chexpert_per_label_f1": {
143
+ "Enlarged Cardiomediastinum": 0.028169014084507043,
144
+ "Cardiomegaly": 0.19082125603864733,
145
+ "Lung Opacity": 0.0,
146
+ "Lung Lesion": 0.0,
147
+ "Edema": 0.0,
148
+ "Consolidation": 0.0,
149
+ "Pneumonia": 0.0,
150
+ "Atelectasis": 0.0,
151
+ "Pneumothorax": 0.0,
152
+ "Pleural Effusion": 0.0,
153
+ "Pleural Other": 0.0,
154
+ "Fracture": 0.0,
155
+ "Support Devices": 0.4929681717246484,
156
+ "No Finding": 0.3490909090909091
157
+ },
158
+ "radgraph_f1": 0.09257418507561303,
159
+ "radgraph_f1_entity": 0.15825019075973226,
160
+ "radgraph_f1_relation": 0.1427606782830153,
161
+ "radgraph_available": true,
162
+ "radgraph_error": null
163
+ }
164
+ }
165
  }