dmusingu
/

lapvqa-diffvqa-native

+---
+tags:
+- chest-xray
+- radiology
+- visual-question-answering
+- differential-vqa
+- mimic-cxr
+license: apache-2.0
+---
+# LAPVQA — Differential VQA (Native / End-to-end)
+Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa).
+## Description
+DiffVQA models trained **end-to-end** (encoder + task head jointly fine-tuned), providing
+a strong upper bound compared to the frozen-encoder variant in
+[`lapvqa-diffvqa`](https://huggingface.co/dmusingu/lapvqa-diffvqa).
+MAE-ViT-L/16 is the primary encoder studied in this native setting.
+## Results (test set, MAE-ViT-L/16)
+| BLEU-4 | ROUGE-2 | RadGraph-s | BERTScore F1 |
+|---|---|---|---|
+| 0.472 | 0.573 | 0.288 | 0.938 |
+## Files
+| File | Encoder backbone |
+|---|---|
+| `clip-vit-l14_best.pt` | CLIP ViT-L/14 (fine-tuned) |
+| `coca_best.pt` | CoCa (fine-tuned) |
+| `florence2_best.pt` | Florence-2 (fine-tuned) |
+| `mae-vit-l16_best.pt` | MAE ViT-L/16 (fine-tuned) |
+| `siglip_best.pt` | SigLIP (fine-tuned) |