dmusingu commited on
Commit
3b87217
·
verified ·
1 Parent(s): e6fe790

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - chest-xray
4
+ - radiology
5
+ - visual-question-answering
6
+ - differential-vqa
7
+ - mimic-cxr
8
+ license: apache-2.0
9
+ ---
10
+
11
+ # LAPVQA — Differential VQA (Native / End-to-end)
12
+
13
+ Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa).
14
+
15
+ ## Description
16
+
17
+ DiffVQA models trained **end-to-end** (encoder + task head jointly fine-tuned), providing
18
+ a strong upper bound compared to the frozen-encoder variant in
19
+ [`lapvqa-diffvqa`](https://huggingface.co/dmusingu/lapvqa-diffvqa).
20
+ MAE-ViT-L/16 is the primary encoder studied in this native setting.
21
+
22
+ ## Results (test set, MAE-ViT-L/16)
23
+
24
+ | BLEU-4 | ROUGE-2 | RadGraph-s | BERTScore F1 |
25
+ |---|---|---|---|
26
+ | 0.472 | 0.573 | 0.288 | 0.938 |
27
+
28
+ ## Files
29
+
30
+ | File | Encoder backbone |
31
+ |---|---|
32
+ | `clip-vit-l14_best.pt` | CLIP ViT-L/14 (fine-tuned) |
33
+ | `coca_best.pt` | CoCa (fine-tuned) |
34
+ | `florence2_best.pt` | Florence-2 (fine-tuned) |
35
+ | `mae-vit-l16_best.pt` | MAE ViT-L/16 (fine-tuned) |
36
+ | `siglip_best.pt` | SigLIP (fine-tuned) |