dmusingu
/

lapvqa-vqa-pretrain-captioning

Visual Question Answering

Model card Files Files and versions

lapvqa-vqa-pretrain-captioning / README.md

dmusingu's picture

Update README with model loading code

152ffba verified 26 days ago

|

History Blame Contribute Delete

837 Bytes

	---
	tags:
	- chest-xray
	- radiology
	- visual-question-answering
	- mimic-cxr
	license: apache-2.0
	---

	# LAPVQA — VQA (Captioning-Pretrained Encoder)

	Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa).

	## Description

	VQA task head trained on top of the LAPVQA captioning-pretrained encoder
	([`lapvqa-pretrain-captioning`](https://huggingface.co/dmusingu/lapvqa-pretrain-captioning)).
	The encoder is kept frozen; this checkpoint contains the `VQAHead` state dict only.
	The encoder outputs 1024-dim patch tokens (ViT-L/14).

	## Loading

	```python
	import torch
	from lapvqa.vqa.model import VQAHead

	ckpt = torch.load("pretrain-captioning_best.pt", map_location="cpu")
	head = VQAHead(vis_dim=1024)
	head.load_state_dict(ckpt)
	head.eval()
	# pair with encoder_final.pt from lapvqa-pretrain-captioning
	```