Adding `safetensors` variant of this model

#1
by SFconvertbot - opened
Files changed (1) hide show
  1. README.md +24 -144
README.md CHANGED
@@ -1,140 +1,43 @@
1
  ---
2
- language:
3
- - en
4
- license: gpl-3.0
5
- library_name: transformers
6
  tags:
 
7
  - clip
8
- - vision
9
- - medical
10
  - bert
11
- pipeline_tag: zero-shot-image-classification
12
- widget:
13
- - src: https://huggingface.co/spaces/kaveh/radiology-image-retrieval/resolve/main/images/ROCO_09402.jpg
14
- candidate_labels: Chest X-Ray, Brain MRI, Abdomen CT Scan, Ultrasound, OPG
15
- example_title: Abdomen CT Scan
16
- - src: https://huggingface.co/spaces/kaveh/radiology-image-retrieval/resolve/main/images/ROCO_00319.jpg
17
- candidate_labels: Chest X-Ray, Brain MRI, Abdomen CT Scan, Ultrasound, OPG
18
- example_title: Chest X-Ray
19
- - src: https://huggingface.co/spaces/kaveh/radiology-image-retrieval/resolve/main/images/ROCO_00016.jpg
20
- candidate_labels: Chest X-Ray, Brain MRI, Abdomen CT Scan, Ultrasound, OPG
21
- example_title: MRI
22
- - src: https://huggingface.co/spaces/kaveh/radiology-image-retrieval/resolve/main/images/ROCO_02259.jpg
23
- candidate_labels: Chest X-Ray, Brain MRI, Abdomen CT Scan, Ultrasound, OPG
24
- example_title: Ultrasound
25
- base_model: openai/clip-vit-large-patch14
26
  ---
27
 
28
  # RCLIP (Clip model fine-tuned on radiology images and their captions)
 
29
  This model is a fine-tuned version of [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) as an image encoder and [microsoft/BiomedVLP-CXR-BERT-general](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-general) as a text encoder on the [ROCO dataset](https://github.com/razorx89/roco-dataset).
30
  It achieves the following results on the evaluation set:
31
  - Loss: 0.3388
32
 
33
  ## Heatmap
 
34
  Here is the heatmap of the similarity score of the first 30 samples on the test split of the ROCO dataset of images vs their captions:
35
  ![heatmap](https://imgur.com/fPFM694.png)
36
 
37
- ## Image Retrieval
38
- This model can be utilized for image retrieval purposes, as demonstrated below:
39
-
40
- ### 1-Save Image Embeddings
41
- <details>
42
- <summary>click to show the code</summary>
43
-
44
- ```python
45
- from PIL import Image
46
- import numpy as np
47
- import pickle, os, torch
48
- from transformers import VisionTextDualEncoderModel, VisionTextDualEncoderProcessor
49
-
50
- # load model
51
- model = VisionTextDualEncoderModel.from_pretrained("kaveh/rclip")
52
- processor = VisionTextDualEncoderProcessor.from_pretrained("kaveh/rclip")
53
-
54
- # TO-DO
55
- images_path = "/path/to/images/"
56
- images = [os.path.join(images_path,i) for i in os.listdir(images_path) if i.endswith(".jpg")]
57
-
58
- # generate embeddings of images in your dataset
59
- image_embeds = []
60
- for img in images:
61
- with torch.no_grad():
62
- inputs = processor(text=None, images=Image.open(img), return_tensors="pt", padding=True)
63
- outputs = model.get_image_features(**inputs)[0].numpy()
64
- image_embeds.append(outputs)
65
-
66
- # save images embeddings in a pickle file
67
- with open("embeddings.pkl", 'wb') as f:
68
- pickle.dump(np.array(image_embeds), f)
69
- ```
70
- </details>
71
-
72
- ### 2-Query for Images
73
- ```python
74
- import numpy as np
75
- from sklearn.metrics.pairwise import cosine_similarity
76
- from PIL import Image
77
- import pickle, torch, os
78
- from transformers import VisionTextDualEncoderModel, VisionTextDualEncoderProcessor
79
-
80
- # search a query in embeddings
81
- query = "Chest X-Ray photos"
82
-
83
- # embed the query
84
- inputs = processor(text=query, images=None, return_tensors="pt", padding=True)
85
- with torch.no_grad():
86
- query_embedding = model.get_text_features(**inputs)[0].numpy()
87
-
88
- # load image embeddings
89
- with open("embeddings.pkl", 'rb') as f:
90
- image_embeds = pickle.load(f)
91
-
92
- # find similar images indices
93
- def find_k_similar_images(query_embedding, image_embeds, k=2):
94
- similarities = cosine_similarity(query_embedding.reshape(1, -1), image_embeds)
95
- closest_indices = np.argsort(similarities[0])[::-1][:k]
96
- return closest_indices
97
- similar_image_indices = find_k_similar_images(query_embedding, image_embeds, k=k)
98
-
99
- # TO-DO
100
- images_path = "/path/to/images/"
101
- images = [os.path.join(images_path,i) for i in os.listdir(images_path) if i.endswith(".jpg")]
102
-
103
- # get image paths
104
- similar_image_names = [images[index] for index in similar_image_indices]
105
- Image.open(similar_image_names[0])
106
- ```
107
-
108
- ## Zero-Shot Image Classification
109
- This model can be effectively employed for zero-shot image classification, as exemplified below:
110
- ```python
111
- import requests
112
- from PIL import Image
113
- import matplotlib.pyplot as plt
114
-
115
- from transformers import VisionTextDualEncoderModel, VisionTextDualEncoderProcessor
116
-
117
- model = VisionTextDualEncoderModel.from_pretrained("kaveh/rclip")
118
- processor = VisionTextDualEncoderProcessor.from_pretrained("kaveh/rclip")
119
 
120
- url = "https://huggingface.co/spaces/kaveh/radiology-image-retrieval/resolve/main/images/ROCO_09402.jpg"
121
- image = Image.open(requests.get(url, stream=True).raw)
122
- possible_class_names = ["Chest X-Ray", "Brain MRI", "Abdominal CT Scan", "Ultrasound", "OPG"]
123
-
124
- inputs = processor(text=possible_class_names, images=image, return_tensors="pt", padding=True)
125
- probs = model(**inputs).logits_per_image.softmax(dim=1).squeeze()
 
 
 
126
 
127
- print("".join([x[0] + ": " + x[1] + "\n" for x in zip(possible_class_names, [format(prob, ".4%") for prob in probs])]))
128
- image
129
- ```
130
 
131
- ## Metrics
132
- | Training Loss | Epoch | Step | Validation Loss |
133
- |:-------------:|:-----:|:-----:|:---------------:|
134
- | 0.0974 | 4.13 | 22500 | 0.3388 |
135
- <details>
136
- <summary>expand to view all steps</summary>
137
-
138
  | Training Loss | Epoch | Step | Validation Loss |
139
  |:-------------:|:-----:|:-----:|:---------------:|
140
  | 0.7951 | 0.09 | 500 | 1.1912 |
@@ -183,33 +86,10 @@ image
183
  | 0.0983 | 4.04 | 22000 | 0.3390 |
184
  | 0.0974 | 4.13 | 22500 | 0.3388 |
185
 
186
- </details>
187
 
188
- ## Hyperparameters
189
- The following hyperparameters were used during training:
190
- - learning_rate: 5e-05
191
- - train_batch_size: 24
192
- - eval_batch_size: 24
193
- - seed: 42
194
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
195
- - lr_scheduler_type: cosine
196
- - lr_scheduler_warmup_steps: 500
197
- - num_epochs: 8.0
198
 
199
- ## Framework Versions
200
  - Transformers 4.31.0.dev0
201
  - Pytorch 2.0.1+cu117
202
  - Datasets 2.13.1
203
- - Tokenizers 0.13.3
204
-
205
- ## Citation
206
- ```bibtex
207
- @misc{https://doi.org/10.57967/hf/0896,
208
- doi = {10.57967/HF/0896},
209
- url = {https://huggingface.co/kaveh/rclip},
210
- author = {{Kaveh Shahhosseini}},
211
- title = {rclip},
212
- publisher = {Hugging Face},
213
- year = {2023}
214
- }
215
- ```
 
1
  ---
 
 
 
 
2
  tags:
3
+ - generated_from_trainer
4
  - clip
 
 
5
  - bert
6
+ - vision-language models
7
+ model-index:
8
+ - name: output_8_clip14_cxrbert
9
+ results: []
10
+ language:
11
+ - en
12
+ library_name: transformers
13
+ pipeline_tag: feature-extraction
 
 
 
 
 
 
 
14
  ---
15
 
16
  # RCLIP (Clip model fine-tuned on radiology images and their captions)
17
+
18
  This model is a fine-tuned version of [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) as an image encoder and [microsoft/BiomedVLP-CXR-BERT-general](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-general) as a text encoder on the [ROCO dataset](https://github.com/razorx89/roco-dataset).
19
  It achieves the following results on the evaluation set:
20
  - Loss: 0.3388
21
 
22
  ## Heatmap
23
+
24
  Here is the heatmap of the similarity score of the first 30 samples on the test split of the ROCO dataset of images vs their captions:
25
  ![heatmap](https://imgur.com/fPFM694.png)
26
 
27
+ ### Training hyperparameters
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
+ The following hyperparameters were used during training:
30
+ - learning_rate: 5e-05
31
+ - train_batch_size: 24
32
+ - eval_batch_size: 24
33
+ - seed: 42
34
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
35
+ - lr_scheduler_type: cosine
36
+ - lr_scheduler_warmup_steps: 500
37
+ - num_epochs: 8.0
38
 
39
+ ### Training results
 
 
40
 
 
 
 
 
 
 
 
41
  | Training Loss | Epoch | Step | Validation Loss |
42
  |:-------------:|:-----:|:-----:|:---------------:|
43
  | 0.7951 | 0.09 | 500 | 1.1912 |
 
86
  | 0.0983 | 4.04 | 22000 | 0.3390 |
87
  | 0.0974 | 4.13 | 22500 | 0.3388 |
88
 
 
89
 
90
+ ### Framework versions
 
 
 
 
 
 
 
 
 
91
 
 
92
  - Transformers 4.31.0.dev0
93
  - Pytorch 2.0.1+cu117
94
  - Datasets 2.13.1
95
+ - Tokenizers 0.13.3