Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

README.md +55 -0
config.json +48 -0
model.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,55 @@

+---
+    language: en
+    tags:
+    - clip
+    - medical-imaging
+    - radiology
+    - roco
+    - vision-language
+    base_model: openai/clip-vit-base-patch32
+    metrics:
+    - recall
+    license: mit
+    ---
+    #  ROCO-Radiology-CLIP (ViT-B/32)
+    > **A specialized vision-language model for radiology, fine-tuned on the ROCO dataset.**
+    This model aligns medical images (X-rays, CTs, MRIs) with their textual descriptions, enabling **zero-shot classification** and **semantic search** for radiology concepts.
+    ##  Performance (Test Set)
+    | Metric | Score | Description |
+    | :--- | :--- | :--- |
+    | **Batch-wise R@1** | **70.8%** | Accuracy in classifying the correct image out of 32 candidates. |
+    | **Batch-wise R@5** | **97.0%** | Accuracy that the correct image is in the top 5 candidates. |
+    | **Global R@5** | **16.18%** | Retrieval recall across the full test set (8,000+ images). |
+    ## 🚀 Usage
+    ```python
+    from transformers import CLIPProcessor, CLIPModel
+    from PIL import Image
+    model_id = "spicy03/CLIP-ROCO-v1"
+    model = CLIPModel.from_pretrained(model_id)
+    processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
+    image = Image.open("chest_xray.jpg")
+    labels = ["Pneumonia", "Normal Chest X-ray", "Brain MRI"]
+    inputs = processor(text=labels, images=image, return_tensors="pt", padding=True)
+    outputs = model(**inputs)
+    probs = outputs.logits_per_image.softmax(dim=1)
+    for label, prob in zip(labels, probs[0]):
+      print(f"{label}: {prob:.2f}")
+    Training Details
+    Dataset: ROCO (Radiology Objects in COntext)
+    Base Model: openai/clip-vit-base-patch32
+    Hardware: Fine-tuned on a single NVIDIA T4 GPU using mixed precision and gradient accumulation.
+    Epochs: 5 (Selected best checkpoint based on Val Loss).

config.json ADDED Viewed

	@@ -0,0 +1,48 @@

+{
+  "architectures": [
+    "CLIPModel"
+  ],
+  "dtype": "float32",
+  "initializer_factor": 1.0,
+  "logit_scale_init_value": 2.6592,
+  "model_type": "clip",
+  "projection_dim": 512,
+  "text_config": {
+    "attention_dropout": 0.0,
+    "bos_token_id": 0,
+    "dropout": 0.0,
+    "dtype": "float32",
+    "eos_token_id": 2,
+    "hidden_act": "quick_gelu",
+    "hidden_size": 512,
+    "initializer_factor": 1.0,
+    "initializer_range": 0.02,
+    "intermediate_size": 2048,
+    "layer_norm_eps": 1e-05,
+    "max_position_embeddings": 77,
+    "model_type": "clip_text_model",
+    "num_attention_heads": 8,
+    "num_hidden_layers": 12,
+    "projection_dim": 512,
+    "vocab_size": 49408
+  },
+  "transformers_version": "4.57.3",
+  "vision_config": {
+    "attention_dropout": 0.0,
+    "dropout": 0.0,
+    "dtype": "float32",
+    "hidden_act": "quick_gelu",
+    "hidden_size": 768,
+    "image_size": 224,
+    "initializer_factor": 1.0,
+    "initializer_range": 0.02,
+    "intermediate_size": 3072,
+    "layer_norm_eps": 1e-05,
+    "model_type": "clip_vision_model",
+    "num_attention_heads": 12,
+    "num_channels": 3,
+    "num_hidden_layers": 12,
+    "patch_size": 32,
+    "projection_dim": 512
+  }
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:34fe0f873d5b2cbdacde13af95df85f1c90c3bdab978c58d4493551019029905
+size 605156676