Upload MIMIC test evaluation results
Browse files
README.md
CHANGED
|
@@ -29,14 +29,6 @@ metrics:
|
|
| 29 |
|
| 30 |

|
| 31 |
|
| 32 |
-
## Status
|
| 33 |
-
|
| 34 |
-
- Project status: `Training in progress`
|
| 35 |
-
- Release status: `Research preview checkpoint`
|
| 36 |
-
- Current checkpoint status: `Not final`
|
| 37 |
-
- Training completion toward planned run: `100.00%` (`4.000` / `3` epochs)
|
| 38 |
-
- Current published metrics are intermediate and will change as training continues.
|
| 39 |
-
|
| 40 |
## Overview
|
| 41 |
|
| 42 |
LAnA is a medical report-generation project for chest X-ray images. The completed project is intended to generate radiology reports with a vision-language model guided by layer-wise anatomical attention built from predicted anatomical masks.
|
|
@@ -45,82 +37,6 @@ The architecture combines a DINOv3 vision encoder, lung and heart segmentation h
|
|
| 45 |
|
| 46 |
## How to Run
|
| 47 |
|
| 48 |
-
For local inference instructions, go to the [Inference](#inference) section.
|
| 49 |
-
|
| 50 |
-
## Intended Use
|
| 51 |
-
|
| 52 |
-
- Input: a chest X-ray image resized to `512x512` and normalized with ImageNet mean/std.
|
| 53 |
-
- Output: a generated radiology report.
|
| 54 |
-
- Best fit: research use, report-generation experiments, and anatomical-attention ablations.
|
| 55 |
-
|
| 56 |
-
## Data
|
| 57 |
-
|
| 58 |
-
- Full project datasets: CheXpert and MIMIC-CXR.
|
| 59 |
-
- Intended project scope: train on curated chest X-ray/report data from both datasets and evaluate on MIMIC-CXR test studies.
|
| 60 |
-
- Current released checkpoint datasets: `CheXpert, MIMIC-CXR` for training and `CheXpert, MIMIC-CXR` for validation.
|
| 61 |
-
- Current published evaluation: MIMIC-CXR test split, `frontal-only (PA/AP)` studies.
|
| 62 |
-
|
| 63 |
-
## Evaluation
|
| 64 |
-
|
| 65 |
-
- Text-generation metrics used in this project include BLEU, METEOR, ROUGE, and CIDEr.
|
| 66 |
-
- Medical report metrics implemented in the repository include RadGraph F1 and CheXpert F1 (`14-micro`, `5-micro`, `14-macro`, `5-macro`).
|
| 67 |
-
|
| 68 |
-
## Training Snapshot
|
| 69 |
-
|
| 70 |
-
- Run: `full_3_epoch_mask_run`
|
| 71 |
-
- This section describes the current public checkpoint, not the final completed project.
|
| 72 |
-
- Method: `lora_adamw`
|
| 73 |
-
- Vision encoder: `facebook/dinov3-vits16-pretrain-lvd1689m`
|
| 74 |
-
- Text decoder: `gpt2`
|
| 75 |
-
- Segmentation encoder: `facebook/dinov3-convnext-small-pretrain-lvd1689m`
|
| 76 |
-
- Image size: `512`
|
| 77 |
-
- Local batch size: `1`
|
| 78 |
-
- Effective global batch size: `8`
|
| 79 |
-
- Scheduler: `cosine`
|
| 80 |
-
- Warmup steps: `5114`
|
| 81 |
-
- Weight decay: `0.01`
|
| 82 |
-
- Steps completed: `102264`
|
| 83 |
-
- Planned total steps: `102276`
|
| 84 |
-
- Images seen: `818196`
|
| 85 |
-
- Total training time: `23.5798` hours
|
| 86 |
-
- Hardware: `NVIDIA GeForce RTX 5070`
|
| 87 |
-
- Final train loss: `1.1683`
|
| 88 |
-
- Validation loss: `1.3692`
|
| 89 |
-
|
| 90 |
-
## MIMIC Test Results
|
| 91 |
-
|
| 92 |
-
Frontal-only evaluation using `PA/AP` studies only.
|
| 93 |
-
|
| 94 |
-
### Current Checkpoint Results
|
| 95 |
-
|
| 96 |
-
| Metric | Value |
|
| 97 |
-
| --- | --- |
|
| 98 |
-
| Number of studies | `3041` |
|
| 99 |
-
| RadGraph F1 | `0.0918` |
|
| 100 |
-
| RadGraph entity F1 | `0.1399` |
|
| 101 |
-
| RadGraph relation F1 | `0.1246` |
|
| 102 |
-
| CheXpert F1 14-micro | `0.1829` |
|
| 103 |
-
| CheXpert F1 5-micro | `0.2183` |
|
| 104 |
-
| CheXpert F1 14-macro | `0.1095` |
|
| 105 |
-
| CheXpert F1 5-macro | `0.1634` |
|
| 106 |
-
|
| 107 |
-
### Final Completed Training Results
|
| 108 |
-
|
| 109 |
-
The final table will be populated when the planned training run is completed. Until then, final-report metrics remain `TBD`.
|
| 110 |
-
|
| 111 |
-
| Metric | Value |
|
| 112 |
-
| --- | --- |
|
| 113 |
-
| Number of studies | TBD |
|
| 114 |
-
| RadGraph F1 | TBD |
|
| 115 |
-
| RadGraph entity F1 | TBD |
|
| 116 |
-
| RadGraph relation F1 | TBD |
|
| 117 |
-
| CheXpert F1 14-micro | TBD |
|
| 118 |
-
| CheXpert F1 5-micro | TBD |
|
| 119 |
-
| CheXpert F1 14-macro | TBD |
|
| 120 |
-
| CheXpert F1 5-macro | TBD |
|
| 121 |
-
|
| 122 |
-
## Inference
|
| 123 |
-
|
| 124 |
Standard `AutoModel.from_pretrained(..., trust_remote_code=True)` loading is currently blocked for this repo because the custom model constructor performs nested pretrained submodel loads.
|
| 125 |
Use the verified manual load path below instead: download the HF repo snapshot, import the downloaded package, and load the exported `model.safetensors` directly.
|
| 126 |
|
|
@@ -171,27 +87,88 @@ report = model.tokenizer.batch_decode(generated, skip_special_tokens=True)[0]
|
|
| 171 |
print(report)
|
| 172 |
```
|
| 173 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 174 |
## Notes
|
| 175 |
|
| 176 |
- `segmenters/` contains the lung and heart segmentation checkpoints used to build anatomical attention masks.
|
| 177 |
- `evaluations/mimic_test_metrics.json` contains the latest saved MIMIC test metrics.
|
| 178 |
-
|
| 179 |
-
<!-- EVAL_RESULTS_START -->
|
| 180 |
-
## Latest Evaluation
|
| 181 |
-
|
| 182 |
-
- Dataset: `MIMIC-CXR test`
|
| 183 |
-
- View filter: `frontal-only (PA/AP)`
|
| 184 |
-
- Number of examples: `3041`
|
| 185 |
-
- CheXpert F1 14-micro: `0.1829`
|
| 186 |
-
- CheXpert F1 5-micro: `0.2183`
|
| 187 |
-
- CheXpert F1 14-macro: `0.1095`
|
| 188 |
-
- CheXpert F1 5-macro: `0.1634`
|
| 189 |
-
- RadGraph F1: `0.0918`
|
| 190 |
-
- RadGraph entity F1: `0.1399`
|
| 191 |
-
- RadGraph relation F1: `0.1246`
|
| 192 |
-
- RadGraph available: `True`
|
| 193 |
-
- RadGraph error: `None`
|
| 194 |
-
|
| 195 |
-
- Evaluation file: `evaluations/mimic_test_metrics.json`
|
| 196 |
-
- Predictions file: `evaluations/mimic_test_predictions.csv`
|
| 197 |
-
<!-- EVAL_RESULTS_END -->
|
|
|
|
| 29 |
|
| 30 |

|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
## Overview
|
| 33 |
|
| 34 |
LAnA is a medical report-generation project for chest X-ray images. The completed project is intended to generate radiology reports with a vision-language model guided by layer-wise anatomical attention built from predicted anatomical masks.
|
|
|
|
| 37 |
|
| 38 |
## How to Run
|
| 39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
Standard `AutoModel.from_pretrained(..., trust_remote_code=True)` loading is currently blocked for this repo because the custom model constructor performs nested pretrained submodel loads.
|
| 41 |
Use the verified manual load path below instead: download the HF repo snapshot, import the downloaded package, and load the exported `model.safetensors` directly.
|
| 42 |
|
|
|
|
| 87 |
print(report)
|
| 88 |
```
|
| 89 |
|
| 90 |
+
## Intended Use
|
| 91 |
+
|
| 92 |
+
- Input: a chest X-ray image resized to `512x512` and normalized with ImageNet mean/std.
|
| 93 |
+
- Output: a generated radiology report.
|
| 94 |
+
- Best fit: research use, report-generation experiments, and anatomical-attention ablations.
|
| 95 |
+
|
| 96 |
+
## MIMIC Test Results
|
| 97 |
+
|
| 98 |
+
Frontal-only evaluation using `PA/AP` studies only.
|
| 99 |
+
|
| 100 |
+
### Final Completed Training Results
|
| 101 |
+
|
| 102 |
+
These final-report metrics correspond to the completed training run.
|
| 103 |
+
|
| 104 |
+
### All Frontal Test Studies
|
| 105 |
+
|
| 106 |
+
| Metric | Value |
|
| 107 |
+
| --- | --- |
|
| 108 |
+
| Number of studies | `3041` |
|
| 109 |
+
| RadGraph F1 | `0.0918` |
|
| 110 |
+
| RadGraph entity F1 | `0.1399` |
|
| 111 |
+
| RadGraph relation F1 | `0.1246` |
|
| 112 |
+
| CheXpert F1 14-micro | `0.1829` |
|
| 113 |
+
| CheXpert F1 5-micro | `0.2183` |
|
| 114 |
+
| CheXpert F1 14-macro | `0.1095` |
|
| 115 |
+
| CheXpert F1 5-macro | `0.1634` |
|
| 116 |
+
|
| 117 |
+
### Findings-Only Frontal Test Studies
|
| 118 |
+
|
| 119 |
+
| Metric | Value |
|
| 120 |
+
| --- | --- |
|
| 121 |
+
| Number of studies | `2210` |
|
| 122 |
+
| RadGraph F1 | `0.1010` |
|
| 123 |
+
| RadGraph entity F1 | `0.1517` |
|
| 124 |
+
| RadGraph relation F1 | `0.1347` |
|
| 125 |
+
| CheXpert F1 14-micro | `0.1651` |
|
| 126 |
+
| CheXpert F1 5-micro | `0.2152` |
|
| 127 |
+
| CheXpert F1 14-macro | `0.1047` |
|
| 128 |
+
| CheXpert F1 5-macro | `0.1611` |
|
| 129 |
+
|
| 130 |
+
## Data
|
| 131 |
+
|
| 132 |
+
- Full project datasets: CheXpert and MIMIC-CXR.
|
| 133 |
+
- Intended project scope: train on curated chest X-ray/report data from both datasets and evaluate on MIMIC-CXR test studies.
|
| 134 |
+
- Current released checkpoint datasets: `CheXpert, MIMIC-CXR` for training and `CheXpert, MIMIC-CXR` for validation.
|
| 135 |
+
- Current published evaluation: MIMIC-CXR test split, `frontal-only (PA/AP)` studies.
|
| 136 |
+
|
| 137 |
+
## Evaluation
|
| 138 |
+
|
| 139 |
+
- Medical report metrics implemented in the repository include RadGraph F1 and CheXpert F1 (`14-micro`, `5-micro`, `14-macro`, `5-macro`).
|
| 140 |
+
|
| 141 |
+
## Training Snapshot
|
| 142 |
+
|
| 143 |
+
- Run: `full_3_epoch_mask_run`
|
| 144 |
+
- This section describes the completed public training run.
|
| 145 |
+
- Method: `lora_adamw`
|
| 146 |
+
- Vision encoder: `facebook/dinov3-vits16-pretrain-lvd1689m`
|
| 147 |
+
- Text decoder: `gpt2`
|
| 148 |
+
- Segmentation encoder: `facebook/dinov3-convnext-small-pretrain-lvd1689m`
|
| 149 |
+
- Image size: `512`
|
| 150 |
+
- Local batch size: `1`
|
| 151 |
+
- Effective global batch size: `8`
|
| 152 |
+
- Scheduler: `cosine`
|
| 153 |
+
- Warmup steps: `5114`
|
| 154 |
+
- Weight decay: `0.01`
|
| 155 |
+
- Steps completed: `102264`
|
| 156 |
+
- Planned total steps: `102276`
|
| 157 |
+
- Images seen: `818196`
|
| 158 |
+
- Total training time: `23.5798` hours
|
| 159 |
+
- Hardware: `NVIDIA GeForce RTX 5070`
|
| 160 |
+
- Final train loss: `1.1683`
|
| 161 |
+
- Validation loss: `1.3692`
|
| 162 |
+
|
| 163 |
+
## Status
|
| 164 |
+
|
| 165 |
+
- Project status: `Training completed`
|
| 166 |
+
- Release status: `Completed training run`
|
| 167 |
+
- Current checkpoint status: `Final completed run`
|
| 168 |
+
- Training completion toward planned run: `100.00%` (`3` / `3` epochs)
|
| 169 |
+
- Current published metrics correspond to the completed training run.
|
| 170 |
+
|
| 171 |
## Notes
|
| 172 |
|
| 173 |
- `segmenters/` contains the lung and heart segmentation checkpoints used to build anatomical attention masks.
|
| 174 |
- `evaluations/mimic_test_metrics.json` contains the latest saved MIMIC test metrics.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
evaluations/mimic_test_findings_only_metrics.json
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"split": "test",
|
| 3 |
+
"subset": "findings-only frontal studies",
|
| 4 |
+
"dataset": "mimic-cxr",
|
| 5 |
+
"view_filter": "frontal-only (PA/AP), structured Findings section only",
|
| 6 |
+
"num_examples": 2210,
|
| 7 |
+
"chexpert_f1_14_micro": 0.16506270049577138,
|
| 8 |
+
"chexpert_f1_5_micro": 0.21520692974013475,
|
| 9 |
+
"chexpert_f1_14_macro": 0.10472446617305661,
|
| 10 |
+
"chexpert_f1_5_macro": 0.16106779379149633,
|
| 11 |
+
"chexpert_f1_micro": 0.16506270049577138,
|
| 12 |
+
"chexpert_f1_macro": 0.10472446617305661,
|
| 13 |
+
"chexpert_per_label_f1": {
|
| 14 |
+
"Enlarged Cardiomediastinum": 0.0,
|
| 15 |
+
"Cardiomegaly": 0.09737827715355805,
|
| 16 |
+
"Lung Opacity": 0.0,
|
| 17 |
+
"Lung Lesion": 0.0,
|
| 18 |
+
"Edema": 0.27852998065764023,
|
| 19 |
+
"Consolidation": 0.0667384284176534,
|
| 20 |
+
"Pneumonia": 0.1375796178343949,
|
| 21 |
+
"Atelectasis": 0.0482897384305835,
|
| 22 |
+
"Pneumothorax": 0.021455938697318006,
|
| 23 |
+
"Pleural Effusion": 0.31440254429804637,
|
| 24 |
+
"Pleural Other": 0.0,
|
| 25 |
+
"Fracture": 0.06052631578947368,
|
| 26 |
+
"Support Devices": 0.4412416851441242,
|
| 27 |
+
"No Finding": 0.0
|
| 28 |
+
},
|
| 29 |
+
"radgraph_f1": 0.10102933280223365,
|
| 30 |
+
"radgraph_f1_entity": 0.15171508935265537,
|
| 31 |
+
"radgraph_f1_relation": 0.13465579667248295,
|
| 32 |
+
"radgraph_available": true,
|
| 33 |
+
"radgraph_error": null
|
| 34 |
+
}
|
evaluations/mimic_test_findings_only_predictions.csv
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
evaluations/mimic_test_metrics.json
CHANGED
|
@@ -1,5 +1,6 @@
|
|
| 1 |
{
|
| 2 |
"split": "test",
|
|
|
|
| 3 |
"dataset": "mimic-cxr",
|
| 4 |
"view_filter": "frontal-only (PA/AP)",
|
| 5 |
"num_examples": 3041,
|
|
@@ -29,5 +30,74 @@
|
|
| 29 |
"radgraph_f1_entity": 0.13993790644379023,
|
| 30 |
"radgraph_f1_relation": 0.12464719867951028,
|
| 31 |
"radgraph_available": true,
|
| 32 |
-
"radgraph_error": null
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
}
|
|
|
|
| 1 |
{
|
| 2 |
"split": "test",
|
| 3 |
+
"subset": "all frontal studies",
|
| 4 |
"dataset": "mimic-cxr",
|
| 5 |
"view_filter": "frontal-only (PA/AP)",
|
| 6 |
"num_examples": 3041,
|
|
|
|
| 30 |
"radgraph_f1_entity": 0.13993790644379023,
|
| 31 |
"radgraph_f1_relation": 0.12464719867951028,
|
| 32 |
"radgraph_available": true,
|
| 33 |
+
"radgraph_error": null,
|
| 34 |
+
"evaluation_suite": "mimic_test_dual",
|
| 35 |
+
"all_test": {
|
| 36 |
+
"split": "test",
|
| 37 |
+
"subset": "all frontal studies",
|
| 38 |
+
"dataset": "mimic-cxr",
|
| 39 |
+
"view_filter": "frontal-only (PA/AP)",
|
| 40 |
+
"num_examples": 3041,
|
| 41 |
+
"chexpert_f1_14_micro": 0.18291666666666664,
|
| 42 |
+
"chexpert_f1_5_micro": 0.21831082003001773,
|
| 43 |
+
"chexpert_f1_14_macro": 0.10945797832551928,
|
| 44 |
+
"chexpert_f1_5_macro": 0.1633553219570594,
|
| 45 |
+
"chexpert_f1_micro": 0.18291666666666664,
|
| 46 |
+
"chexpert_f1_macro": 0.10945797832551928,
|
| 47 |
+
"chexpert_per_label_f1": {
|
| 48 |
+
"Enlarged Cardiomediastinum": 0.0,
|
| 49 |
+
"Cardiomegaly": 0.10195227765726682,
|
| 50 |
+
"Lung Opacity": 0.0020470829068577278,
|
| 51 |
+
"Lung Lesion": 0.0,
|
| 52 |
+
"Edema": 0.2789757412398922,
|
| 53 |
+
"Consolidation": 0.06424344885883347,
|
| 54 |
+
"Pneumonia": 0.14311926605504585,
|
| 55 |
+
"Atelectasis": 0.0428380187416332,
|
| 56 |
+
"Pneumothorax": 0.030358227079538558,
|
| 57 |
+
"Pleural Effusion": 0.32876712328767127,
|
| 58 |
+
"Pleural Other": 0.0,
|
| 59 |
+
"Fracture": 0.0633879781420765,
|
| 60 |
+
"Support Devices": 0.4767225325884544,
|
| 61 |
+
"No Finding": 0.0
|
| 62 |
+
},
|
| 63 |
+
"radgraph_f1": 0.09181957971495504,
|
| 64 |
+
"radgraph_f1_entity": 0.13993790644379023,
|
| 65 |
+
"radgraph_f1_relation": 0.12464719867951028,
|
| 66 |
+
"radgraph_available": true,
|
| 67 |
+
"radgraph_error": null
|
| 68 |
+
},
|
| 69 |
+
"findings_only_test": {
|
| 70 |
+
"split": "test",
|
| 71 |
+
"subset": "findings-only frontal studies",
|
| 72 |
+
"dataset": "mimic-cxr",
|
| 73 |
+
"view_filter": "frontal-only (PA/AP), structured Findings section only",
|
| 74 |
+
"num_examples": 2210,
|
| 75 |
+
"chexpert_f1_14_micro": 0.16506270049577138,
|
| 76 |
+
"chexpert_f1_5_micro": 0.21520692974013475,
|
| 77 |
+
"chexpert_f1_14_macro": 0.10472446617305661,
|
| 78 |
+
"chexpert_f1_5_macro": 0.16106779379149633,
|
| 79 |
+
"chexpert_f1_micro": 0.16506270049577138,
|
| 80 |
+
"chexpert_f1_macro": 0.10472446617305661,
|
| 81 |
+
"chexpert_per_label_f1": {
|
| 82 |
+
"Enlarged Cardiomediastinum": 0.0,
|
| 83 |
+
"Cardiomegaly": 0.09737827715355805,
|
| 84 |
+
"Lung Opacity": 0.0,
|
| 85 |
+
"Lung Lesion": 0.0,
|
| 86 |
+
"Edema": 0.27852998065764023,
|
| 87 |
+
"Consolidation": 0.0667384284176534,
|
| 88 |
+
"Pneumonia": 0.1375796178343949,
|
| 89 |
+
"Atelectasis": 0.0482897384305835,
|
| 90 |
+
"Pneumothorax": 0.021455938697318006,
|
| 91 |
+
"Pleural Effusion": 0.31440254429804637,
|
| 92 |
+
"Pleural Other": 0.0,
|
| 93 |
+
"Fracture": 0.06052631578947368,
|
| 94 |
+
"Support Devices": 0.4412416851441242,
|
| 95 |
+
"No Finding": 0.0
|
| 96 |
+
},
|
| 97 |
+
"radgraph_f1": 0.10102933280223365,
|
| 98 |
+
"radgraph_f1_entity": 0.15171508935265537,
|
| 99 |
+
"radgraph_f1_relation": 0.13465579667248295,
|
| 100 |
+
"radgraph_available": true,
|
| 101 |
+
"radgraph_error": null
|
| 102 |
+
}
|
| 103 |
}
|
evaluations/mimic_test_predictions.csv
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
run_summary.json
CHANGED
|
@@ -42,6 +42,7 @@
|
|
| 42 |
"validation_datasets": "CheXpert, MIMIC-CXR",
|
| 43 |
"latest_evaluation": {
|
| 44 |
"split": "test",
|
|
|
|
| 45 |
"dataset": "mimic-cxr",
|
| 46 |
"view_filter": "frontal-only (PA/AP)",
|
| 47 |
"num_examples": 3041,
|
|
@@ -72,5 +73,75 @@
|
|
| 72 |
"radgraph_f1_relation": 0.12464719867951028,
|
| 73 |
"radgraph_available": true,
|
| 74 |
"radgraph_error": null
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
}
|
| 76 |
}
|
|
|
|
| 42 |
"validation_datasets": "CheXpert, MIMIC-CXR",
|
| 43 |
"latest_evaluation": {
|
| 44 |
"split": "test",
|
| 45 |
+
"subset": "all frontal studies",
|
| 46 |
"dataset": "mimic-cxr",
|
| 47 |
"view_filter": "frontal-only (PA/AP)",
|
| 48 |
"num_examples": 3041,
|
|
|
|
| 73 |
"radgraph_f1_relation": 0.12464719867951028,
|
| 74 |
"radgraph_available": true,
|
| 75 |
"radgraph_error": null
|
| 76 |
+
},
|
| 77 |
+
"latest_evaluations": {
|
| 78 |
+
"all_test": {
|
| 79 |
+
"split": "test",
|
| 80 |
+
"subset": "all frontal studies",
|
| 81 |
+
"dataset": "mimic-cxr",
|
| 82 |
+
"view_filter": "frontal-only (PA/AP)",
|
| 83 |
+
"num_examples": 3041,
|
| 84 |
+
"chexpert_f1_14_micro": 0.18291666666666664,
|
| 85 |
+
"chexpert_f1_5_micro": 0.21831082003001773,
|
| 86 |
+
"chexpert_f1_14_macro": 0.10945797832551928,
|
| 87 |
+
"chexpert_f1_5_macro": 0.1633553219570594,
|
| 88 |
+
"chexpert_f1_micro": 0.18291666666666664,
|
| 89 |
+
"chexpert_f1_macro": 0.10945797832551928,
|
| 90 |
+
"chexpert_per_label_f1": {
|
| 91 |
+
"Enlarged Cardiomediastinum": 0.0,
|
| 92 |
+
"Cardiomegaly": 0.10195227765726682,
|
| 93 |
+
"Lung Opacity": 0.0020470829068577278,
|
| 94 |
+
"Lung Lesion": 0.0,
|
| 95 |
+
"Edema": 0.2789757412398922,
|
| 96 |
+
"Consolidation": 0.06424344885883347,
|
| 97 |
+
"Pneumonia": 0.14311926605504585,
|
| 98 |
+
"Atelectasis": 0.0428380187416332,
|
| 99 |
+
"Pneumothorax": 0.030358227079538558,
|
| 100 |
+
"Pleural Effusion": 0.32876712328767127,
|
| 101 |
+
"Pleural Other": 0.0,
|
| 102 |
+
"Fracture": 0.0633879781420765,
|
| 103 |
+
"Support Devices": 0.4767225325884544,
|
| 104 |
+
"No Finding": 0.0
|
| 105 |
+
},
|
| 106 |
+
"radgraph_f1": 0.09181957971495504,
|
| 107 |
+
"radgraph_f1_entity": 0.13993790644379023,
|
| 108 |
+
"radgraph_f1_relation": 0.12464719867951028,
|
| 109 |
+
"radgraph_available": true,
|
| 110 |
+
"radgraph_error": null
|
| 111 |
+
},
|
| 112 |
+
"findings_only_test": {
|
| 113 |
+
"split": "test",
|
| 114 |
+
"subset": "findings-only frontal studies",
|
| 115 |
+
"dataset": "mimic-cxr",
|
| 116 |
+
"view_filter": "frontal-only (PA/AP), structured Findings section only",
|
| 117 |
+
"num_examples": 2210,
|
| 118 |
+
"chexpert_f1_14_micro": 0.16506270049577138,
|
| 119 |
+
"chexpert_f1_5_micro": 0.21520692974013475,
|
| 120 |
+
"chexpert_f1_14_macro": 0.10472446617305661,
|
| 121 |
+
"chexpert_f1_5_macro": 0.16106779379149633,
|
| 122 |
+
"chexpert_f1_micro": 0.16506270049577138,
|
| 123 |
+
"chexpert_f1_macro": 0.10472446617305661,
|
| 124 |
+
"chexpert_per_label_f1": {
|
| 125 |
+
"Enlarged Cardiomediastinum": 0.0,
|
| 126 |
+
"Cardiomegaly": 0.09737827715355805,
|
| 127 |
+
"Lung Opacity": 0.0,
|
| 128 |
+
"Lung Lesion": 0.0,
|
| 129 |
+
"Edema": 0.27852998065764023,
|
| 130 |
+
"Consolidation": 0.0667384284176534,
|
| 131 |
+
"Pneumonia": 0.1375796178343949,
|
| 132 |
+
"Atelectasis": 0.0482897384305835,
|
| 133 |
+
"Pneumothorax": 0.021455938697318006,
|
| 134 |
+
"Pleural Effusion": 0.31440254429804637,
|
| 135 |
+
"Pleural Other": 0.0,
|
| 136 |
+
"Fracture": 0.06052631578947368,
|
| 137 |
+
"Support Devices": 0.4412416851441242,
|
| 138 |
+
"No Finding": 0.0
|
| 139 |
+
},
|
| 140 |
+
"radgraph_f1": 0.10102933280223365,
|
| 141 |
+
"radgraph_f1_entity": 0.15171508935265537,
|
| 142 |
+
"radgraph_f1_relation": 0.13465579667248295,
|
| 143 |
+
"radgraph_available": true,
|
| 144 |
+
"radgraph_error": null
|
| 145 |
+
}
|
| 146 |
}
|
| 147 |
}
|