File size: 1,336 Bytes
1ab8ca0 6b4f920 1ab8ca0 6b4f920 1ab8ca0 6b4f920 1ab8ca0 6b4f920 1ab8ca0 6b4f920 98b9781 1ab8ca0 6b4f920 1ab8ca0 6b4f920 1ab8ca0 6b4f920 1ab8ca0 6b4f920 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
---
language: en
tags:
- clip
- medical-imaging
- radiology
- roco
- vision-language
base_model: openai/clip-vit-base-patch32
datasets:
- eltorio/ROCO-radiology
metrics:
- recall
license: mit
---
# ROCO-Radiology-CLIP (ViT-B/32)
> **A specialized vision-language model for radiology, fine-tuned on the ROCO dataset.**
This model aligns medical images (X-rays, CTs, MRIs) with their textual descriptions, enabling **zero-shot classification** and **semantic search** for radiology concepts.
## Performance (Test Set)
- **Batch-wise Recall@1:** 70.83% (State-of-the-art for T4 fine-tuning)
- **Batch-wise Recall@5:** 96.99%
- **Global Retrieval Recall@1:** ~6% (500x better than random chance)
- **Global Retrieval Recall@5:** ~16%
Though a lot of work need to be done on this as the recall is still quite low. It will be updated with newer version
## Usage
```python
from transformers import CLIPProcessor, CLIPModel
from PIL import Image
model = CLIPModel.from_pretrained("spicy03/CLIP-ROCO-v1")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
# Predict
image = Image.open("chest_xray.jpg")
labels = ["Pneumonia", "Normal", "Edema"]
inputs = processor(text=labels, images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
probs = outputs.logits_per_image.softmax(dim=1)
print(probs) |