| tags: | |
| - chest-xray | |
| - radiology | |
| - visual-question-answering | |
| - mimic-cxr | |
| license: apache-2.0 | |
| # LAPVQA — VQA (Captioning-Pretrained Encoder) | |
| Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa). | |
| ## Description | |
| VQA task head trained on top of the **LAPVQA captioning-pretrained encoder** | |
| ([`lapvqa-pretrain-captioning`](https://huggingface.co/dmusingu/lapvqa-pretrain-captioning)). | |
| The encoder is kept frozen; this checkpoint contains the `VQAHead` state dict only. | |
| The encoder outputs 1024-dim patch tokens (ViT-L/14). | |
| ## Loading | |
| ```python | |
| import torch | |
| from lapvqa.vqa.model import VQAHead | |
| ckpt = torch.load("pretrain-captioning_best.pt", map_location="cpu") | |
| head = VQAHead(vis_dim=1024) | |
| head.load_state_dict(ckpt) | |
| head.eval() | |
| # pair with encoder_final.pt from lapvqa-pretrain-captioning | |
| ``` | |