dmusingu commited on
Commit
47b83de
·
verified ·
1 Parent(s): ed050db

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - chest-xray
4
+ - radiology
5
+ - phrase-grounding
6
+ - mimic-cxr
7
+ license: apache-2.0
8
+ ---
9
+
10
+ # LAPVQA — Phrase Grounding
11
+
12
+ Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa).
13
+
14
+ ## Description
15
+
16
+ Phrase grounding heads trained on MIMIC-CXR, predicting the bounding box of a described
17
+ abnormality given the image and a text phrase (e.g. "Pleural Effusion").
18
+ Six frozen encoder backbones are covered; each file contains only the grounding head weights.
19
+
20
+ ## Results (MIMIC-CXR test set)
21
+
22
+ **Zero-shot (no fine-tuning):**
23
+
24
+ | Encoder | mIoU | Acc@0.25 |
25
+ |---|---|---|
26
+ | SigLIP | 0.086 | 0.042 |
27
+ | Florence-2 | 0.089 | 0.046 |
28
+ | CLIP ViT-L/14 | 0.085 | 0.028 |
29
+ | CoCa | 0.082 | 0.023 |
30
+ | OWLv2 | 0.082 | 0.023 |
31
+
32
+ **Fine-tuned (MAE-ViT-L/16):**
33
+
34
+ | mIoU | Acc@0.25 | Acc@0.50 | Pointing Acc |
35
+ |---|---|---|---|
36
+ | 0.320 | 0.569 | 0.273 | 0.593 |
37
+
38
+ Fine-tuning provides a ~4× improvement over zero-shot across all encoders.
39
+
40
+ ## Files
41
+
42
+ | File | Encoder backbone |
43
+ |---|---|
44
+ | `clip-vit-l14.pt` | CLIP ViT-L/14 |
45
+ | `siglip.pt` | SigLIP |
46
+ | `florence2.pt` | Florence-2 |
47
+ | `coca.pt` | CoCa |
48
+ | `owlv2.pt` | OWLv2 |
49
+ | `mae-vit-l16.pt` | MAE ViT-L/16 (fine-tuned) |