Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- chest-xray
|
| 4 |
+
- radiology
|
| 5 |
+
- visual-question-answering
|
| 6 |
+
- differential-vqa
|
| 7 |
+
- mimic-cxr
|
| 8 |
+
license: apache-2.0
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# LAPVQA — Differential VQA (Native / End-to-end)
|
| 12 |
+
|
| 13 |
+
Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa).
|
| 14 |
+
|
| 15 |
+
## Description
|
| 16 |
+
|
| 17 |
+
DiffVQA models trained **end-to-end** (encoder + task head jointly fine-tuned), providing
|
| 18 |
+
a strong upper bound compared to the frozen-encoder variant in
|
| 19 |
+
[`lapvqa-diffvqa`](https://huggingface.co/dmusingu/lapvqa-diffvqa).
|
| 20 |
+
MAE-ViT-L/16 is the primary encoder studied in this native setting.
|
| 21 |
+
|
| 22 |
+
## Results (test set, MAE-ViT-L/16)
|
| 23 |
+
|
| 24 |
+
| BLEU-4 | ROUGE-2 | RadGraph-s | BERTScore F1 |
|
| 25 |
+
|---|---|---|---|
|
| 26 |
+
| 0.472 | 0.573 | 0.288 | 0.938 |
|
| 27 |
+
|
| 28 |
+
## Files
|
| 29 |
+
|
| 30 |
+
| File | Encoder backbone |
|
| 31 |
+
|---|---|
|
| 32 |
+
| `clip-vit-l14_best.pt` | CLIP ViT-L/14 (fine-tuned) |
|
| 33 |
+
| `coca_best.pt` | CoCa (fine-tuned) |
|
| 34 |
+
| `florence2_best.pt` | Florence-2 (fine-tuned) |
|
| 35 |
+
| `mae-vit-l16_best.pt` | MAE ViT-L/16 (fine-tuned) |
|
| 36 |
+
| `siglip_best.pt` | SigLIP (fine-tuned) |
|