--- tags: - chest-xray - radiology - report-generation - mimic-cxr license: apache-2.0 --- # LAPVQA — Radiology Report Generation (Native / End-to-end) Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa). ## Description RRG decoders trained **end-to-end** alongside their vision encoders. Each checkpoint is a dict: `{state_dict, vis_dim, d_model, num_layers, nhead, encoder, epoch, val_bleu4}`. | File | Encoder | vis_dim | |---|---|---| | `clip-vit-l14.pt` | CLIP ViT-L/14 (fine-tuned) | 1024 | | `siglip.pt` | SigLIP (fine-tuned) | 1152 | | `florence2.pt` | Florence-2 (fine-tuned) | 1024 | | `coca.pt` | CoCa (fine-tuned) | 768 | | `mae-vit-l16.pt` | MAE ViT-L/16 (fine-tuned) | 1024 | ## Results (MIMIC-CXR test set, MAE-ViT-L/16) | BLEU-4 | ROUGE-L | RadGraph-s | |---|---|---| | 0.032 | 0.164 | 0.195 | ## Loading ```python import torch from lapvqa.rrg.heads import ReportGenerationHead ckpt = torch.load("mae-vit-l16.pt", map_location="cpu") head = ReportGenerationHead( vis_dim = ckpt["vis_dim"], d_model = ckpt["d_model"], num_layers = ckpt["num_layers"], nhead = ckpt["nhead"], ) head.load_state_dict(ckpt["state_dict"]) head.eval() ```