dmusingu's picture
Update README with model loading code
152ffba verified
|
Raw
History Blame Contribute Delete
837 Bytes
---
tags:
- chest-xray
- radiology
- visual-question-answering
- mimic-cxr
license: apache-2.0
---
# LAPVQA — VQA (Captioning-Pretrained Encoder)
Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa).
## Description
VQA task head trained on top of the **LAPVQA captioning-pretrained encoder**
([`lapvqa-pretrain-captioning`](https://huggingface.co/dmusingu/lapvqa-pretrain-captioning)).
The encoder is kept frozen; this checkpoint contains the `VQAHead` state dict only.
The encoder outputs 1024-dim patch tokens (ViT-L/14).
## Loading
```python
import torch
from lapvqa.vqa.model import VQAHead
ckpt = torch.load("pretrain-captioning_best.pt", map_location="cpu")
head = VQAHead(vis_dim=1024)
head.load_state_dict(ckpt)
head.eval()
# pair with encoder_final.pt from lapvqa-pretrain-captioning
```