| --- |
| tags: |
| - chest-xray |
| - radiology |
| - object-detection |
| - abnormality-detection |
| - vindr-cxr |
| license: apache-2.0 |
| --- |
| |
| # LAPVQA β Abnormality Detection |
|
|
| Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa). |
|
|
| ## Description |
|
|
| DETR-style detection heads for 14-class chest abnormality detection on VinDr-CXR, |
| trained on top of six **frozen** vision encoders. |
| Each checkpoint is a dict: `{state_dict, vis_dim, d_model, num_queries, num_enc, num_dec, encoder, epoch, val_map40, val_map50}`. |
|
|
| ## Architecture β `DetectionHead` |
|
|
| ``` |
| vis_proj : Linear(vis_dim β 256) |
| encoder : 2 Γ TransformerEncoderLayer (self-attn, pre-norm) |
| object_queries : Parameter [1, 20, 256] |
| decoder : 3 Γ TransformerDecoderLayer (cross-attn to encoder output) |
| class_head : Linear(256 β 15) # 14 classes + background |
| box_head : MLP(256 β 256 β 4) # (cx,cy,w,h) β [0,1] |
| ``` |
|
|
| ## Results (VinDr-CXR test, mAP@IoU=0.4) |
|
|
| | Encoder | mAP@0.4 (test) | |
| |---|---| |
| | OWLv2 | 0.048 | |
| | SigLIP | ~0.045 | |
| | CLIP ViT-L/14 | ~0.040 | |
|
|
| | File | Encoder | vis_dim | |
| |---|---|---| |
| | `clip-vit-l14.pt` | CLIP ViT-L/14 | 1024 | |
| | `siglip.pt` | SigLIP | 1152 | |
| | `florence2.pt` | Florence-2 | 1024 | |
| | `coca.pt` | CoCa | 768 | |
| | `owlv2.pt` | OWLv2 | 1024 | |
| | `mae-vit-l16.pt` | MAE ViT-L/16 | 1024 | |
| |
| ## Loading |
| |
| ```python |
| import torch |
| from lapvqa.ad.heads import DetectionHead |
| from lapvqa.ad.heads import predict |
| |
| ckpt = torch.load("owlv2.pt", map_location="cpu") |
| head = DetectionHead( |
| vis_dim = ckpt["vis_dim"], |
| d_model = ckpt["d_model"], |
| num_queries = ckpt["num_queries"], |
| num_enc_layers = ckpt["num_enc"], |
| num_dec_layers = ckpt["num_dec"], |
| ) |
| head.load_state_dict(ckpt["state_dict"]) |
| head.eval() |
| |
| with torch.no_grad(): |
| # vis_tokens: [B, HW, vis_dim] β spatial patch tokens from the frozen encoder |
| outputs = head(vis_tokens) |
| detections = predict(outputs, score_threshold=0.1, nms_iou=0.5) |
| # detections[i]: {'boxes': [K,4] xyxy, 'labels': [K], 'scores': [K]} |
| ``` |
| |