| --- |
| tags: |
| - chest-xray |
| - radiology |
| - report-generation |
| - mimic-cxr |
| license: apache-2.0 |
| --- |
| |
| # LAPVQA — Radiology Report Generation (Native / End-to-end) |
|
|
| Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa). |
|
|
| ## Description |
|
|
| RRG decoders trained **end-to-end** alongside their vision encoders. |
| Each checkpoint is a dict: `{state_dict, vis_dim, d_model, num_layers, nhead, encoder, epoch, val_bleu4}`. |
|
|
| | File | Encoder | vis_dim | |
| |---|---|---| |
| | `clip-vit-l14.pt` | CLIP ViT-L/14 (fine-tuned) | 1024 | |
| | `siglip.pt` | SigLIP (fine-tuned) | 1152 | |
| | `florence2.pt` | Florence-2 (fine-tuned) | 1024 | |
| | `coca.pt` | CoCa (fine-tuned) | 768 | |
| | `mae-vit-l16.pt` | MAE ViT-L/16 (fine-tuned) | 1024 | |
| |
| ## Results (MIMIC-CXR test set, MAE-ViT-L/16) |
| |
| | BLEU-4 | ROUGE-L | RadGraph-s | |
| |---|---|---| |
| | 0.032 | 0.164 | 0.195 | |
| |
| ## Loading |
| |
| ```python |
| import torch |
| from lapvqa.rrg.heads import ReportGenerationHead |
| |
| ckpt = torch.load("mae-vit-l16.pt", map_location="cpu") |
| head = ReportGenerationHead( |
| vis_dim = ckpt["vis_dim"], |
| d_model = ckpt["d_model"], |
| num_layers = ckpt["num_layers"], |
| nhead = ckpt["nhead"], |
| ) |
| head.load_state_dict(ckpt["state_dict"]) |
| head.eval() |
| ``` |
| |