manu02 commited on
Commit
cac40db
·
verified ·
1 Parent(s): 068cd17

Upload MIMIC test evaluation results

Browse files
README.md CHANGED
@@ -29,14 +29,6 @@ metrics:
29
 
30
  ![Layer-Wise Anatomical Attention](assets/AnatomicalAttention.gif)
31
 
32
- ## Status
33
-
34
- - Project status: `Training in progress`
35
- - Release status: `Research preview checkpoint`
36
- - Current checkpoint status: `Not final`
37
- - Training completion toward planned run: `100.00%` (`4.000` / `3` epochs)
38
- - Current published metrics are intermediate and will change as training continues.
39
-
40
  ## Overview
41
 
42
  LAnA is a medical report-generation project for chest X-ray images. The completed project is intended to generate radiology reports with a vision-language model guided by layer-wise anatomical attention built from predicted anatomical masks.
@@ -45,82 +37,6 @@ The architecture combines a DINOv3 vision encoder, lung and heart segmentation h
45
 
46
  ## How to Run
47
 
48
- For local inference instructions, go to the [Inference](#inference) section.
49
-
50
- ## Intended Use
51
-
52
- - Input: a chest X-ray image resized to `512x512` and normalized with ImageNet mean/std.
53
- - Output: a generated radiology report.
54
- - Best fit: research use, report-generation experiments, and anatomical-attention ablations.
55
-
56
- ## Data
57
-
58
- - Full project datasets: CheXpert and MIMIC-CXR.
59
- - Intended project scope: train on curated chest X-ray/report data from both datasets and evaluate on MIMIC-CXR test studies.
60
- - Current released checkpoint datasets: `CheXpert, MIMIC-CXR` for training and `CheXpert, MIMIC-CXR` for validation.
61
- - Current published evaluation: MIMIC-CXR test split, `frontal-only (PA/AP)` studies.
62
-
63
- ## Evaluation
64
-
65
- - Text-generation metrics used in this project include BLEU, METEOR, ROUGE, and CIDEr.
66
- - Medical report metrics implemented in the repository include RadGraph F1 and CheXpert F1 (`14-micro`, `5-micro`, `14-macro`, `5-macro`).
67
-
68
- ## Training Snapshot
69
-
70
- - Run: `full_3_epoch_mask_run`
71
- - This section describes the current public checkpoint, not the final completed project.
72
- - Method: `lora_adamw`
73
- - Vision encoder: `facebook/dinov3-vits16-pretrain-lvd1689m`
74
- - Text decoder: `gpt2`
75
- - Segmentation encoder: `facebook/dinov3-convnext-small-pretrain-lvd1689m`
76
- - Image size: `512`
77
- - Local batch size: `1`
78
- - Effective global batch size: `8`
79
- - Scheduler: `cosine`
80
- - Warmup steps: `5114`
81
- - Weight decay: `0.01`
82
- - Steps completed: `102264`
83
- - Planned total steps: `102276`
84
- - Images seen: `818196`
85
- - Total training time: `23.5798` hours
86
- - Hardware: `NVIDIA GeForce RTX 5070`
87
- - Final train loss: `1.1683`
88
- - Validation loss: `1.3692`
89
-
90
- ## MIMIC Test Results
91
-
92
- Frontal-only evaluation using `PA/AP` studies only.
93
-
94
- ### Current Checkpoint Results
95
-
96
- | Metric | Value |
97
- | --- | --- |
98
- | Number of studies | `3041` |
99
- | RadGraph F1 | `0.0918` |
100
- | RadGraph entity F1 | `0.1399` |
101
- | RadGraph relation F1 | `0.1246` |
102
- | CheXpert F1 14-micro | `0.1829` |
103
- | CheXpert F1 5-micro | `0.2183` |
104
- | CheXpert F1 14-macro | `0.1095` |
105
- | CheXpert F1 5-macro | `0.1634` |
106
-
107
- ### Final Completed Training Results
108
-
109
- The final table will be populated when the planned training run is completed. Until then, final-report metrics remain `TBD`.
110
-
111
- | Metric | Value |
112
- | --- | --- |
113
- | Number of studies | TBD |
114
- | RadGraph F1 | TBD |
115
- | RadGraph entity F1 | TBD |
116
- | RadGraph relation F1 | TBD |
117
- | CheXpert F1 14-micro | TBD |
118
- | CheXpert F1 5-micro | TBD |
119
- | CheXpert F1 14-macro | TBD |
120
- | CheXpert F1 5-macro | TBD |
121
-
122
- ## Inference
123
-
124
  Standard `AutoModel.from_pretrained(..., trust_remote_code=True)` loading is currently blocked for this repo because the custom model constructor performs nested pretrained submodel loads.
125
  Use the verified manual load path below instead: download the HF repo snapshot, import the downloaded package, and load the exported `model.safetensors` directly.
126
 
@@ -171,27 +87,88 @@ report = model.tokenizer.batch_decode(generated, skip_special_tokens=True)[0]
171
  print(report)
172
  ```
173
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
174
  ## Notes
175
 
176
  - `segmenters/` contains the lung and heart segmentation checkpoints used to build anatomical attention masks.
177
  - `evaluations/mimic_test_metrics.json` contains the latest saved MIMIC test metrics.
178
-
179
- <!-- EVAL_RESULTS_START -->
180
- ## Latest Evaluation
181
-
182
- - Dataset: `MIMIC-CXR test`
183
- - View filter: `frontal-only (PA/AP)`
184
- - Number of examples: `3041`
185
- - CheXpert F1 14-micro: `0.1829`
186
- - CheXpert F1 5-micro: `0.2183`
187
- - CheXpert F1 14-macro: `0.1095`
188
- - CheXpert F1 5-macro: `0.1634`
189
- - RadGraph F1: `0.0918`
190
- - RadGraph entity F1: `0.1399`
191
- - RadGraph relation F1: `0.1246`
192
- - RadGraph available: `True`
193
- - RadGraph error: `None`
194
-
195
- - Evaluation file: `evaluations/mimic_test_metrics.json`
196
- - Predictions file: `evaluations/mimic_test_predictions.csv`
197
- <!-- EVAL_RESULTS_END -->
 
29
 
30
  ![Layer-Wise Anatomical Attention](assets/AnatomicalAttention.gif)
31
 
 
 
 
 
 
 
 
 
32
  ## Overview
33
 
34
  LAnA is a medical report-generation project for chest X-ray images. The completed project is intended to generate radiology reports with a vision-language model guided by layer-wise anatomical attention built from predicted anatomical masks.
 
37
 
38
  ## How to Run
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  Standard `AutoModel.from_pretrained(..., trust_remote_code=True)` loading is currently blocked for this repo because the custom model constructor performs nested pretrained submodel loads.
41
  Use the verified manual load path below instead: download the HF repo snapshot, import the downloaded package, and load the exported `model.safetensors` directly.
42
 
 
87
  print(report)
88
  ```
89
 
90
+ ## Intended Use
91
+
92
+ - Input: a chest X-ray image resized to `512x512` and normalized with ImageNet mean/std.
93
+ - Output: a generated radiology report.
94
+ - Best fit: research use, report-generation experiments, and anatomical-attention ablations.
95
+
96
+ ## MIMIC Test Results
97
+
98
+ Frontal-only evaluation using `PA/AP` studies only.
99
+
100
+ ### Final Completed Training Results
101
+
102
+ These final-report metrics correspond to the completed training run.
103
+
104
+ ### All Frontal Test Studies
105
+
106
+ | Metric | Value |
107
+ | --- | --- |
108
+ | Number of studies | `3041` |
109
+ | RadGraph F1 | `0.0918` |
110
+ | RadGraph entity F1 | `0.1399` |
111
+ | RadGraph relation F1 | `0.1246` |
112
+ | CheXpert F1 14-micro | `0.1829` |
113
+ | CheXpert F1 5-micro | `0.2183` |
114
+ | CheXpert F1 14-macro | `0.1095` |
115
+ | CheXpert F1 5-macro | `0.1634` |
116
+
117
+ ### Findings-Only Frontal Test Studies
118
+
119
+ | Metric | Value |
120
+ | --- | --- |
121
+ | Number of studies | `2210` |
122
+ | RadGraph F1 | `0.1010` |
123
+ | RadGraph entity F1 | `0.1517` |
124
+ | RadGraph relation F1 | `0.1347` |
125
+ | CheXpert F1 14-micro | `0.1651` |
126
+ | CheXpert F1 5-micro | `0.2152` |
127
+ | CheXpert F1 14-macro | `0.1047` |
128
+ | CheXpert F1 5-macro | `0.1611` |
129
+
130
+ ## Data
131
+
132
+ - Full project datasets: CheXpert and MIMIC-CXR.
133
+ - Intended project scope: train on curated chest X-ray/report data from both datasets and evaluate on MIMIC-CXR test studies.
134
+ - Current released checkpoint datasets: `CheXpert, MIMIC-CXR` for training and `CheXpert, MIMIC-CXR` for validation.
135
+ - Current published evaluation: MIMIC-CXR test split, `frontal-only (PA/AP)` studies.
136
+
137
+ ## Evaluation
138
+
139
+ - Medical report metrics implemented in the repository include RadGraph F1 and CheXpert F1 (`14-micro`, `5-micro`, `14-macro`, `5-macro`).
140
+
141
+ ## Training Snapshot
142
+
143
+ - Run: `full_3_epoch_mask_run`
144
+ - This section describes the completed public training run.
145
+ - Method: `lora_adamw`
146
+ - Vision encoder: `facebook/dinov3-vits16-pretrain-lvd1689m`
147
+ - Text decoder: `gpt2`
148
+ - Segmentation encoder: `facebook/dinov3-convnext-small-pretrain-lvd1689m`
149
+ - Image size: `512`
150
+ - Local batch size: `1`
151
+ - Effective global batch size: `8`
152
+ - Scheduler: `cosine`
153
+ - Warmup steps: `5114`
154
+ - Weight decay: `0.01`
155
+ - Steps completed: `102264`
156
+ - Planned total steps: `102276`
157
+ - Images seen: `818196`
158
+ - Total training time: `23.5798` hours
159
+ - Hardware: `NVIDIA GeForce RTX 5070`
160
+ - Final train loss: `1.1683`
161
+ - Validation loss: `1.3692`
162
+
163
+ ## Status
164
+
165
+ - Project status: `Training completed`
166
+ - Release status: `Completed training run`
167
+ - Current checkpoint status: `Final completed run`
168
+ - Training completion toward planned run: `100.00%` (`3` / `3` epochs)
169
+ - Current published metrics correspond to the completed training run.
170
+
171
  ## Notes
172
 
173
  - `segmenters/` contains the lung and heart segmentation checkpoints used to build anatomical attention masks.
174
  - `evaluations/mimic_test_metrics.json` contains the latest saved MIMIC test metrics.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
evaluations/mimic_test_findings_only_metrics.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "split": "test",
3
+ "subset": "findings-only frontal studies",
4
+ "dataset": "mimic-cxr",
5
+ "view_filter": "frontal-only (PA/AP), structured Findings section only",
6
+ "num_examples": 2210,
7
+ "chexpert_f1_14_micro": 0.16506270049577138,
8
+ "chexpert_f1_5_micro": 0.21520692974013475,
9
+ "chexpert_f1_14_macro": 0.10472446617305661,
10
+ "chexpert_f1_5_macro": 0.16106779379149633,
11
+ "chexpert_f1_micro": 0.16506270049577138,
12
+ "chexpert_f1_macro": 0.10472446617305661,
13
+ "chexpert_per_label_f1": {
14
+ "Enlarged Cardiomediastinum": 0.0,
15
+ "Cardiomegaly": 0.09737827715355805,
16
+ "Lung Opacity": 0.0,
17
+ "Lung Lesion": 0.0,
18
+ "Edema": 0.27852998065764023,
19
+ "Consolidation": 0.0667384284176534,
20
+ "Pneumonia": 0.1375796178343949,
21
+ "Atelectasis": 0.0482897384305835,
22
+ "Pneumothorax": 0.021455938697318006,
23
+ "Pleural Effusion": 0.31440254429804637,
24
+ "Pleural Other": 0.0,
25
+ "Fracture": 0.06052631578947368,
26
+ "Support Devices": 0.4412416851441242,
27
+ "No Finding": 0.0
28
+ },
29
+ "radgraph_f1": 0.10102933280223365,
30
+ "radgraph_f1_entity": 0.15171508935265537,
31
+ "radgraph_f1_relation": 0.13465579667248295,
32
+ "radgraph_available": true,
33
+ "radgraph_error": null
34
+ }
evaluations/mimic_test_findings_only_predictions.csv ADDED
The diff for this file is too large to render. See raw diff
 
evaluations/mimic_test_metrics.json CHANGED
@@ -1,5 +1,6 @@
1
  {
2
  "split": "test",
 
3
  "dataset": "mimic-cxr",
4
  "view_filter": "frontal-only (PA/AP)",
5
  "num_examples": 3041,
@@ -29,5 +30,74 @@
29
  "radgraph_f1_entity": 0.13993790644379023,
30
  "radgraph_f1_relation": 0.12464719867951028,
31
  "radgraph_available": true,
32
- "radgraph_error": null
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  }
 
1
  {
2
  "split": "test",
3
+ "subset": "all frontal studies",
4
  "dataset": "mimic-cxr",
5
  "view_filter": "frontal-only (PA/AP)",
6
  "num_examples": 3041,
 
30
  "radgraph_f1_entity": 0.13993790644379023,
31
  "radgraph_f1_relation": 0.12464719867951028,
32
  "radgraph_available": true,
33
+ "radgraph_error": null,
34
+ "evaluation_suite": "mimic_test_dual",
35
+ "all_test": {
36
+ "split": "test",
37
+ "subset": "all frontal studies",
38
+ "dataset": "mimic-cxr",
39
+ "view_filter": "frontal-only (PA/AP)",
40
+ "num_examples": 3041,
41
+ "chexpert_f1_14_micro": 0.18291666666666664,
42
+ "chexpert_f1_5_micro": 0.21831082003001773,
43
+ "chexpert_f1_14_macro": 0.10945797832551928,
44
+ "chexpert_f1_5_macro": 0.1633553219570594,
45
+ "chexpert_f1_micro": 0.18291666666666664,
46
+ "chexpert_f1_macro": 0.10945797832551928,
47
+ "chexpert_per_label_f1": {
48
+ "Enlarged Cardiomediastinum": 0.0,
49
+ "Cardiomegaly": 0.10195227765726682,
50
+ "Lung Opacity": 0.0020470829068577278,
51
+ "Lung Lesion": 0.0,
52
+ "Edema": 0.2789757412398922,
53
+ "Consolidation": 0.06424344885883347,
54
+ "Pneumonia": 0.14311926605504585,
55
+ "Atelectasis": 0.0428380187416332,
56
+ "Pneumothorax": 0.030358227079538558,
57
+ "Pleural Effusion": 0.32876712328767127,
58
+ "Pleural Other": 0.0,
59
+ "Fracture": 0.0633879781420765,
60
+ "Support Devices": 0.4767225325884544,
61
+ "No Finding": 0.0
62
+ },
63
+ "radgraph_f1": 0.09181957971495504,
64
+ "radgraph_f1_entity": 0.13993790644379023,
65
+ "radgraph_f1_relation": 0.12464719867951028,
66
+ "radgraph_available": true,
67
+ "radgraph_error": null
68
+ },
69
+ "findings_only_test": {
70
+ "split": "test",
71
+ "subset": "findings-only frontal studies",
72
+ "dataset": "mimic-cxr",
73
+ "view_filter": "frontal-only (PA/AP), structured Findings section only",
74
+ "num_examples": 2210,
75
+ "chexpert_f1_14_micro": 0.16506270049577138,
76
+ "chexpert_f1_5_micro": 0.21520692974013475,
77
+ "chexpert_f1_14_macro": 0.10472446617305661,
78
+ "chexpert_f1_5_macro": 0.16106779379149633,
79
+ "chexpert_f1_micro": 0.16506270049577138,
80
+ "chexpert_f1_macro": 0.10472446617305661,
81
+ "chexpert_per_label_f1": {
82
+ "Enlarged Cardiomediastinum": 0.0,
83
+ "Cardiomegaly": 0.09737827715355805,
84
+ "Lung Opacity": 0.0,
85
+ "Lung Lesion": 0.0,
86
+ "Edema": 0.27852998065764023,
87
+ "Consolidation": 0.0667384284176534,
88
+ "Pneumonia": 0.1375796178343949,
89
+ "Atelectasis": 0.0482897384305835,
90
+ "Pneumothorax": 0.021455938697318006,
91
+ "Pleural Effusion": 0.31440254429804637,
92
+ "Pleural Other": 0.0,
93
+ "Fracture": 0.06052631578947368,
94
+ "Support Devices": 0.4412416851441242,
95
+ "No Finding": 0.0
96
+ },
97
+ "radgraph_f1": 0.10102933280223365,
98
+ "radgraph_f1_entity": 0.15171508935265537,
99
+ "radgraph_f1_relation": 0.13465579667248295,
100
+ "radgraph_available": true,
101
+ "radgraph_error": null
102
+ }
103
  }
evaluations/mimic_test_predictions.csv CHANGED
The diff for this file is too large to render. See raw diff
 
run_summary.json CHANGED
@@ -42,6 +42,7 @@
42
  "validation_datasets": "CheXpert, MIMIC-CXR",
43
  "latest_evaluation": {
44
  "split": "test",
 
45
  "dataset": "mimic-cxr",
46
  "view_filter": "frontal-only (PA/AP)",
47
  "num_examples": 3041,
@@ -72,5 +73,75 @@
72
  "radgraph_f1_relation": 0.12464719867951028,
73
  "radgraph_available": true,
74
  "radgraph_error": null
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  }
76
  }
 
42
  "validation_datasets": "CheXpert, MIMIC-CXR",
43
  "latest_evaluation": {
44
  "split": "test",
45
+ "subset": "all frontal studies",
46
  "dataset": "mimic-cxr",
47
  "view_filter": "frontal-only (PA/AP)",
48
  "num_examples": 3041,
 
73
  "radgraph_f1_relation": 0.12464719867951028,
74
  "radgraph_available": true,
75
  "radgraph_error": null
76
+ },
77
+ "latest_evaluations": {
78
+ "all_test": {
79
+ "split": "test",
80
+ "subset": "all frontal studies",
81
+ "dataset": "mimic-cxr",
82
+ "view_filter": "frontal-only (PA/AP)",
83
+ "num_examples": 3041,
84
+ "chexpert_f1_14_micro": 0.18291666666666664,
85
+ "chexpert_f1_5_micro": 0.21831082003001773,
86
+ "chexpert_f1_14_macro": 0.10945797832551928,
87
+ "chexpert_f1_5_macro": 0.1633553219570594,
88
+ "chexpert_f1_micro": 0.18291666666666664,
89
+ "chexpert_f1_macro": 0.10945797832551928,
90
+ "chexpert_per_label_f1": {
91
+ "Enlarged Cardiomediastinum": 0.0,
92
+ "Cardiomegaly": 0.10195227765726682,
93
+ "Lung Opacity": 0.0020470829068577278,
94
+ "Lung Lesion": 0.0,
95
+ "Edema": 0.2789757412398922,
96
+ "Consolidation": 0.06424344885883347,
97
+ "Pneumonia": 0.14311926605504585,
98
+ "Atelectasis": 0.0428380187416332,
99
+ "Pneumothorax": 0.030358227079538558,
100
+ "Pleural Effusion": 0.32876712328767127,
101
+ "Pleural Other": 0.0,
102
+ "Fracture": 0.0633879781420765,
103
+ "Support Devices": 0.4767225325884544,
104
+ "No Finding": 0.0
105
+ },
106
+ "radgraph_f1": 0.09181957971495504,
107
+ "radgraph_f1_entity": 0.13993790644379023,
108
+ "radgraph_f1_relation": 0.12464719867951028,
109
+ "radgraph_available": true,
110
+ "radgraph_error": null
111
+ },
112
+ "findings_only_test": {
113
+ "split": "test",
114
+ "subset": "findings-only frontal studies",
115
+ "dataset": "mimic-cxr",
116
+ "view_filter": "frontal-only (PA/AP), structured Findings section only",
117
+ "num_examples": 2210,
118
+ "chexpert_f1_14_micro": 0.16506270049577138,
119
+ "chexpert_f1_5_micro": 0.21520692974013475,
120
+ "chexpert_f1_14_macro": 0.10472446617305661,
121
+ "chexpert_f1_5_macro": 0.16106779379149633,
122
+ "chexpert_f1_micro": 0.16506270049577138,
123
+ "chexpert_f1_macro": 0.10472446617305661,
124
+ "chexpert_per_label_f1": {
125
+ "Enlarged Cardiomediastinum": 0.0,
126
+ "Cardiomegaly": 0.09737827715355805,
127
+ "Lung Opacity": 0.0,
128
+ "Lung Lesion": 0.0,
129
+ "Edema": 0.27852998065764023,
130
+ "Consolidation": 0.0667384284176534,
131
+ "Pneumonia": 0.1375796178343949,
132
+ "Atelectasis": 0.0482897384305835,
133
+ "Pneumothorax": 0.021455938697318006,
134
+ "Pleural Effusion": 0.31440254429804637,
135
+ "Pleural Other": 0.0,
136
+ "Fracture": 0.06052631578947368,
137
+ "Support Devices": 0.4412416851441242,
138
+ "No Finding": 0.0
139
+ },
140
+ "radgraph_f1": 0.10102933280223365,
141
+ "radgraph_f1_entity": 0.15171508935265537,
142
+ "radgraph_f1_relation": 0.13465579667248295,
143
+ "radgraph_available": true,
144
+ "radgraph_error": null
145
+ }
146
  }
147
  }