dmusingu
/

lapvqa-rrg-native

report-generation

Model card Files Files and versions

lapvqa-rrg-native / README.md

dmusingu's picture

Update README with model loading code

1b042c5 verified 25 days ago

|

History Blame Contribute Delete

1.22 kB

	---
	tags:
	- chest-xray
	- radiology
	- report-generation
	- mimic-cxr
	license: apache-2.0
	---

	# LAPVQA — Radiology Report Generation (Native / End-to-end)

	Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa).

	## Description

	RRG decoders trained end-to-end alongside their vision encoders.
	Each checkpoint is a dict: `{state_dict, vis_dim, d_model, num_layers, nhead, encoder, epoch, val_bleu4}`.

	\| File \| Encoder \| vis_dim \|
	\|---\|---\|---\|
	\| `clip-vit-l14.pt` \| CLIP ViT-L/14 (fine-tuned) \| 1024 \|
	\| `siglip.pt` \| SigLIP (fine-tuned) \| 1152 \|
	\| `florence2.pt` \| Florence-2 (fine-tuned) \| 1024 \|
	\| `coca.pt` \| CoCa (fine-tuned) \| 768 \|
	\| `mae-vit-l16.pt` \| MAE ViT-L/16 (fine-tuned) \| 1024 \|

	## Results (MIMIC-CXR test set, MAE-ViT-L/16)

	\| BLEU-4 \| ROUGE-L \| RadGraph-s \|
	\|---\|---\|---\|
	\| 0.032 \| 0.164 \| 0.195 \|

	## Loading

	```python
	import torch
	from lapvqa.rrg.heads import ReportGenerationHead

	ckpt = torch.load("mae-vit-l16.pt", map_location="cpu")
	head = ReportGenerationHead(
	vis_dim = ckpt["vis_dim"],
	d_model = ckpt["d_model"],
	num_layers = ckpt["num_layers"],
	nhead = ckpt["nhead"],
	)
	head.load_state_dict(ckpt["state_dict"])
	head.eval()
	```