kaveh
/

rclip

@@ -1,140 +1,43 @@
 ---
-language:
-- en
-license: gpl-3.0
-library_name: transformers
 tags:
 - clip
-- vision
-- medical
 - bert
-pipeline_tag: zero-shot-image-classification
-widget:
-- src: https://huggingface.co/spaces/kaveh/radiology-image-retrieval/resolve/main/images/ROCO_09402.jpg
-  candidate_labels: Chest X-Ray, Brain MRI, Abdomen CT Scan, Ultrasound, OPG
-  example_title: Abdomen CT Scan
-- src: https://huggingface.co/spaces/kaveh/radiology-image-retrieval/resolve/main/images/ROCO_00319.jpg
-  candidate_labels: Chest X-Ray, Brain MRI, Abdomen CT Scan, Ultrasound, OPG
-  example_title: Chest X-Ray
-- src: https://huggingface.co/spaces/kaveh/radiology-image-retrieval/resolve/main/images/ROCO_00016.jpg
-  candidate_labels: Chest X-Ray, Brain MRI, Abdomen CT Scan, Ultrasound, OPG
-  example_title: MRI
-- src: https://huggingface.co/spaces/kaveh/radiology-image-retrieval/resolve/main/images/ROCO_02259.jpg
-  candidate_labels: Chest X-Ray, Brain MRI, Abdomen CT Scan, Ultrasound, OPG
-  example_title: Ultrasound
-base_model: openai/clip-vit-large-patch14
 ---
 # RCLIP (Clip model fine-tuned on radiology images and their captions)
 This model is a fine-tuned version of [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) as an image encoder and [microsoft/BiomedVLP-CXR-BERT-general](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-general) as a text encoder on the [ROCO dataset](https://github.com/razorx89/roco-dataset).
 It achieves the following results on the evaluation set:
 - Loss: 0.3388
 ## Heatmap
 Here is the heatmap of the similarity score of the first 30 samples on the test split of the ROCO dataset of images vs their captions:
 ![heatmap](https://imgur.com/fPFM694.png)
-## Image Retrieval
-This model can be utilized for image retrieval purposes, as demonstrated below:
-### 1-Save Image Embeddings
-<details>
-<summary>click to show the code</summary>
-```python
-from PIL import Image
-import numpy as np
-import pickle, os, torch
-from transformers import VisionTextDualEncoderModel, VisionTextDualEncoderProcessor
-# load model
-model = VisionTextDualEncoderModel.from_pretrained("kaveh/rclip")
-processor = VisionTextDualEncoderProcessor.from_pretrained("kaveh/rclip")
-# TO-DO
-images_path = "/path/to/images/"
-images = [os.path.join(images_path,i) for i in os.listdir(images_path) if i.endswith(".jpg")]
-# generate embeddings of images in your dataset
-image_embeds = []
-for img in images:
-    with torch.no_grad():
-        inputs = processor(text=None, images=Image.open(img), return_tensors="pt", padding=True)
-        outputs = model.get_image_features(**inputs)[0].numpy()
-    image_embeds.append(outputs)
-# save images embeddings in a pickle file
-with open("embeddings.pkl", 'wb') as f:
-    pickle.dump(np.array(image_embeds), f)
-```
-</details>
-### 2-Query for Images
-```python
-import numpy as np
-from sklearn.metrics.pairwise import cosine_similarity
-from PIL import Image
-import pickle, torch, os
-from transformers import VisionTextDualEncoderModel, VisionTextDualEncoderProcessor
-# search a query in embeddings
-query = "Chest X-Ray photos"
-# embed the query
-inputs = processor(text=query, images=None, return_tensors="pt", padding=True)
-with torch.no_grad():
-    query_embedding = model.get_text_features(**inputs)[0].numpy()
-# load image embeddings
-with open("embeddings.pkl", 'rb') as f:
-    image_embeds = pickle.load(f)
-# find similar images indices
-def find_k_similar_images(query_embedding, image_embeds, k=2):
-    similarities = cosine_similarity(query_embedding.reshape(1, -1), image_embeds)
-    closest_indices = np.argsort(similarities[0])[::-1][:k]
-    return closest_indices
-similar_image_indices = find_k_similar_images(query_embedding, image_embeds, k=k)
-# TO-DO
-images_path = "/path/to/images/"
-images = [os.path.join(images_path,i) for i in os.listdir(images_path) if i.endswith(".jpg")]
-# get image paths
-similar_image_names = [images[index] for index in similar_image_indices]
-Image.open(similar_image_names[0])
-```
-## Zero-Shot Image Classification
-This model can be effectively employed for zero-shot image classification, as exemplified below:
-```python
-import requests
-from PIL import Image
-import matplotlib.pyplot as plt
-from transformers import VisionTextDualEncoderModel, VisionTextDualEncoderProcessor
-model = VisionTextDualEncoderModel.from_pretrained("kaveh/rclip")
-processor = VisionTextDualEncoderProcessor.from_pretrained("kaveh/rclip")
-url = "https://huggingface.co/spaces/kaveh/radiology-image-retrieval/resolve/main/images/ROCO_09402.jpg"
-image = Image.open(requests.get(url, stream=True).raw)
-possible_class_names = ["Chest X-Ray", "Brain MRI", "Abdominal CT Scan", "Ultrasound", "OPG"]
-inputs = processor(text=possible_class_names, images=image, return_tensors="pt", padding=True)
-probs = model(**inputs).logits_per_image.softmax(dim=1).squeeze()
-print("".join([x[0] + ": " + x[1] + "\n" for x in zip(possible_class_names, [format(prob, ".4%") for prob in probs])]))
-image
-```
-## Metrics
-| Training Loss | Epoch | Step  | Validation Loss |
-|:-------------:|:-----:|:-----:|:---------------:|
-| 0.0974        | 4.13  | 22500 | 0.3388          |
-<details>
-<summary>expand to view all steps</summary>
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
 | 0.7951        | 0.09  | 500   | 1.1912          |
@@ -183,33 +86,10 @@ image
 | 0.0983        | 4.04  | 22000 | 0.3390          |
 | 0.0974        | 4.13  | 22500 | 0.3388          |
-</details>
-## Hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 24
-- eval_batch_size: 24
-- seed: 42
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 500
-- num_epochs: 8.0
-## Framework Versions
 - Transformers 4.31.0.dev0
 - Pytorch 2.0.1+cu117
 - Datasets 2.13.1
-- Tokenizers 0.13.3
-## Citation
-```bibtex
-@misc{https://doi.org/10.57967/hf/0896,
-  doi = {10.57967/HF/0896},
-  url = {https://huggingface.co/kaveh/rclip},
-  author = {{Kaveh Shahhosseini}},
-  title = {rclip},
-  publisher = {Hugging Face},
-  year = {2023}
-}
-```

 ---
 tags:
+- generated_from_trainer
 - clip
 - bert
+- vision-language models
+model-index:
+- name: output_8_clip14_cxrbert
+  results: []
+language:
+- en
+library_name: transformers
+pipeline_tag: feature-extraction
 ---
 # RCLIP (Clip model fine-tuned on radiology images and their captions)
 This model is a fine-tuned version of [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) as an image encoder and [microsoft/BiomedVLP-CXR-BERT-general](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-general) as a text encoder on the [ROCO dataset](https://github.com/razorx89/roco-dataset).
 It achieves the following results on the evaluation set:
 - Loss: 0.3388
 ## Heatmap
 Here is the heatmap of the similarity score of the first 30 samples on the test split of the ROCO dataset of images vs their captions:
 ![heatmap](https://imgur.com/fPFM694.png)
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 24
+- eval_batch_size: 24
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 500
+- num_epochs: 8.0
+### Training results
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
 | 0.7951        | 0.09  | 500   | 1.1912          |
 | 0.0983        | 4.04  | 22000 | 0.3390          |
 | 0.0974        | 4.13  | 22500 | 0.3388          |
+### Framework versions
 - Transformers 4.31.0.dev0
 - Pytorch 2.0.1+cu117
 - Datasets 2.13.1
+- Tokenizers 0.13.3