Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- chest-xray
|
| 4 |
+
- radiology
|
| 5 |
+
- phrase-grounding
|
| 6 |
+
- mimic-cxr
|
| 7 |
+
license: apache-2.0
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# LAPVQA — Phrase Grounding
|
| 11 |
+
|
| 12 |
+
Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa).
|
| 13 |
+
|
| 14 |
+
## Description
|
| 15 |
+
|
| 16 |
+
Phrase grounding heads trained on MIMIC-CXR, predicting the bounding box of a described
|
| 17 |
+
abnormality given the image and a text phrase (e.g. "Pleural Effusion").
|
| 18 |
+
Six frozen encoder backbones are covered; each file contains only the grounding head weights.
|
| 19 |
+
|
| 20 |
+
## Results (MIMIC-CXR test set)
|
| 21 |
+
|
| 22 |
+
**Zero-shot (no fine-tuning):**
|
| 23 |
+
|
| 24 |
+
| Encoder | mIoU | Acc@0.25 |
|
| 25 |
+
|---|---|---|
|
| 26 |
+
| SigLIP | 0.086 | 0.042 |
|
| 27 |
+
| Florence-2 | 0.089 | 0.046 |
|
| 28 |
+
| CLIP ViT-L/14 | 0.085 | 0.028 |
|
| 29 |
+
| CoCa | 0.082 | 0.023 |
|
| 30 |
+
| OWLv2 | 0.082 | 0.023 |
|
| 31 |
+
|
| 32 |
+
**Fine-tuned (MAE-ViT-L/16):**
|
| 33 |
+
|
| 34 |
+
| mIoU | Acc@0.25 | Acc@0.50 | Pointing Acc |
|
| 35 |
+
|---|---|---|---|
|
| 36 |
+
| 0.320 | 0.569 | 0.273 | 0.593 |
|
| 37 |
+
|
| 38 |
+
Fine-tuning provides a ~4× improvement over zero-shot across all encoders.
|
| 39 |
+
|
| 40 |
+
## Files
|
| 41 |
+
|
| 42 |
+
| File | Encoder backbone |
|
| 43 |
+
|---|---|
|
| 44 |
+
| `clip-vit-l14.pt` | CLIP ViT-L/14 |
|
| 45 |
+
| `siglip.pt` | SigLIP |
|
| 46 |
+
| `florence2.pt` | Florence-2 |
|
| 47 |
+
| `coca.pt` | CoCa |
|
| 48 |
+
| `owlv2.pt` | OWLv2 |
|
| 49 |
+
| `mae-vit-l16.pt` | MAE ViT-L/16 (fine-tuned) |
|