spicy03
/

CLIP-ROCO-v1

medical-imaging

vision-language

Model card Files Files and versions

CLIP-ROCO-v1 / README.md

spicy03's picture

updated recall val

98b9781 verified about 1 month ago

|

history blame contribute delete

1.34 kB

	---
	language: en
	tags:
	- clip
	- medical-imaging
	- radiology
	- roco
	- vision-language
	base_model: openai/clip-vit-base-patch32
	datasets:
	- eltorio/ROCO-radiology
	metrics:
	- recall
	license: mit
	---

	# ROCO-Radiology-CLIP (ViT-B/32)

	> A specialized vision-language model for radiology, fine-tuned on the ROCO dataset.

	This model aligns medical images (X-rays, CTs, MRIs) with their textual descriptions, enabling zero-shot classification and semantic search for radiology concepts.

	## Performance (Test Set)
	- Batch-wise Recall@1: 70.83% (State-of-the-art for T4 fine-tuning)
	- Batch-wise Recall@5: 96.99%
	- Global Retrieval Recall@1: ~6% (500x better than random chance)
	- Global Retrieval Recall@5: ~16%
	Though a lot of work need to be done on this as the recall is still quite low. It will be updated with newer version

	## Usage

	```python
	from transformers import CLIPProcessor, CLIPModel
	from PIL import Image

	model = CLIPModel.from_pretrained("spicy03/CLIP-ROCO-v1")
	processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

	# Predict
	image = Image.open("chest_xray.jpg")
	labels = ["Pneumonia", "Normal", "Edema"]
	inputs = processor(text=labels, images=image, return_tensors="pt", padding=True)
	outputs = model(**inputs)
	probs = outputs.logits_per_image.softmax(dim=1)
	print(probs)