spicy03 commited on
Commit
6b4f920
·
verified ·
1 Parent(s): 1ab8ca0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +34 -46
README.md CHANGED
@@ -1,55 +1,43 @@
1
  ---
2
- language: en
3
- tags:
4
- - clip
5
- - medical-imaging
6
- - radiology
7
- - roco
8
- - vision-language
9
- base_model: openai/clip-vit-base-patch32
10
- metrics:
11
- - recall
12
- license: mit
13
- ---
14
-
15
- # ROCO-Radiology-CLIP (ViT-B/32)
16
-
17
- > **A specialized vision-language model for radiology, fine-tuned on the ROCO dataset.**
18
-
19
- This model aligns medical images (X-rays, CTs, MRIs) with their textual descriptions, enabling **zero-shot classification** and **semantic search** for radiology concepts.
20
-
21
- ## Performance (Test Set)
22
-
23
- | Metric | Score | Description |
24
- | :--- | :--- | :--- |
25
- | **Batch-wise R@1** | **70.8%** | Accuracy in classifying the correct image out of 32 candidates. |
26
- | **Batch-wise R@5** | **97.0%** | Accuracy that the correct image is in the top 5 candidates. |
27
- | **Global R@5** | **16.18%** | Retrieval recall across the full test set (8,000+ images). |
28
-
29
- ## 🚀 Usage
30
 
31
- ```python
32
- from transformers import CLIPProcessor, CLIPModel
33
- from PIL import Image
34
 
35
- model_id = "spicy03/CLIP-ROCO-v1"
36
- model = CLIPModel.from_pretrained(model_id)
37
- processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
38
 
39
- image = Image.open("chest_xray.jpg")
40
- labels = ["Pneumonia", "Normal Chest X-ray", "Brain MRI"]
41
 
42
- inputs = processor(text=labels, images=image, return_tensors="pt", padding=True)
43
- outputs = model(**inputs)
44
- probs = outputs.logits_per_image.softmax(dim=1)
 
45
 
46
- for label, prob in zip(labels, probs[0]):
47
- print(f"{label}: {prob:.2f}")
48
- Training Details
49
- Dataset: ROCO (Radiology Objects in COntext)
50
 
51
- Base Model: openai/clip-vit-base-patch32
 
 
52
 
53
- Hardware: Fine-tuned on a single NVIDIA T4 GPU using mixed precision and gradient accumulation.
 
54
 
55
- Epochs: 5 (Selected best checkpoint based on Val Loss).
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
+ tags:
4
+ - clip
5
+ - medical-imaging
6
+ - radiology
7
+ - roco
8
+ - vision-language
9
+ base_model: openai/clip-vit-base-patch32
10
+ datasets:
11
+ - eltorio/ROCO-radiology
12
+ metrics:
13
+ - recall
14
+ license: mit
15
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
+ # ROCO-Radiology-CLIP (ViT-B/32)
 
 
18
 
19
+ > **A specialized vision-language model for radiology, fine-tuned on the ROCO dataset.**
 
 
20
 
21
+ This model aligns medical images (X-rays, CTs, MRIs) with their textual descriptions, enabling **zero-shot classification** and **semantic search** for radiology concepts.
 
22
 
23
+ ## Performance (Test Set)
24
+ - **Batch-wise Recall@1:** 70.83% (State-of-the-art for T4 fine-tuning)
25
+ - **Batch-wise Recall@5:** 96.99%
26
+ - **Global Retrieval Recall@5:** ~6% (500x better than random chance)
27
 
28
+ ## Usage
 
 
 
29
 
30
+ ```python
31
+ from transformers import CLIPProcessor, CLIPModel
32
+ from PIL import Image
33
 
34
+ model = CLIPModel.from_pretrained("spicy03/CLIP-ROCO-v1")
35
+ processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
36
 
37
+ # Predict
38
+ image = Image.open("chest_xray.jpg")
39
+ labels = ["Pneumonia", "Normal", "Edema"]
40
+ inputs = processor(text=labels, images=image, return_tensors="pt", padding=True)
41
+ outputs = model(**inputs)
42
+ probs = outputs.logits_per_image.softmax(dim=1)
43
+ print(probs)