--- tags: - chest-xray - radiology - visual-question-answering - mimic-cxr license: apache-2.0 --- # LAPVQA — VQA (Captioning-Pretrained Encoder) Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa). ## Description VQA task head trained on top of the **LAPVQA captioning-pretrained encoder** ([`lapvqa-pretrain-captioning`](https://huggingface.co/dmusingu/lapvqa-pretrain-captioning)). The encoder is kept frozen; this checkpoint contains the `VQAHead` state dict only. The encoder outputs 1024-dim patch tokens (ViT-L/14). ## Loading ```python import torch from lapvqa.vqa.model import VQAHead ckpt = torch.load("pretrain-captioning_best.pt", map_location="cpu") head = VQAHead(vis_dim=1024) head.load_state_dict(ckpt) head.eval() # pair with encoder_final.pt from lapvqa-pretrain-captioning ```