redlessone
/

DermLIP_ViT-B-16

+---
+language:
+- en
+license: cc-by-4.0
+tags:
+- vision
+- image-text-to-text
+- medical
+- dermatology
+- multimodal
+- clip
+- zero-shot-classification
+- image-classification
+pipeline_tag: zero-shot-image-classification
+library_name: transformers
+---
+# DermLIP: Dermatology Language-Image Pretraining
+## Model Description
+**DermLIP** is a vision-language model for dermatology, trained on the **Derm1M** dataset—the largest dermatological image-text corpus to date.
+### Model Details
+- **Model Type:** Pretrained Vision-Language Model (CLIP-style)
+- **Architecture:**
+  - **Vision encoder**: ViT-B16
+  - **Text encoder**: GPT2
+- **Resolution:** 224×224 pixels
+- **Paper:** https://arxiv.org/abs/2503.14911
+- **Repository:** https://github.com/SiyuanYan1/Derm1M
+- **license:** cc-by-nc-nd-4.0
+## Training Details
+- **Training data:** 403,563 skin image-text pairs from Derm1M datasets. Images include both dermoscopic and clinical images.
+- **Training objective:** image-text contrastive loss
+- **Hardware:** 1 x Nvidia H200
+- **Hours used:**  ~21.5 hours
+## Intended Uses
+### Primary Use Cases
+- Zero-shot classification
+- Few-shot learning
+- Cross-modal retrieval
+- Concept annotation/explanation
+## How to Use
+### Installation
+First, clone the Derm1M repository:
+```bash
+git clone git@github.com:SiyuanYan1/Derm1M.git
+cd Derm1M
+```
+Then install the package following the instruction in the repository.
+### Quick Start
+```python
+import open_clip
+from PIL import Image
+import torch
+# Load model with huggingface checkpoint
+model, _, preprocess = open_clip.create_model_and_transforms(
+    'hf-hub:redlessone/DermLIP_PanDerm-base-w-PubMed-256'
+)
+model.eval()
+# Initialize tokenizer
+tokenizer = open_clip.get_tokenizer('hf-hub:redlessone/DermLIP_PanDerm-base-w-PubMed-256')
+# Read example image
+image = preprocess(Image.open("your_skin_image.png")).unsqueeze(0)
+# Define disease labels (example: PAD dataset classes)
+PAD_CLASSNAMES = [
+    "nevus",
+    "basal cell carcinoma",
+    "actinic keratosis",
+    "seborrheic keratosis",
+    "squamous cell carcinoma",
+    "melanoma"
+]
+# Build text prompts
+template = lambda c: f'This is a skin image of {c}'
+text = tokenizer([template(c) for c in PAD_CLASSNAMES])
+# Inference
+with torch.no_grad(), torch.autocast("cuda"):
+    # Encode image and text
+    image_features = model.encode_image(image)
+    text_features = model.encode_text(text)
+    # Normalize features
+    image_features /= image_features.norm(dim=-1, keepdim=True)
+    text_features /= text_features.norm(dim=-1, keepdim=True)
+    # Compute similarity
+    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
+# Get prediction
+final_prediction = PAD_CLASSNAMES[torch.argmax(text_probs[0])]
+print(f'This image is diagnosed as {final_prediction}.')
+print("Label probabilities:", text_probs)
+```
+## Contact
+For any additional questions or comments, contact Siyuan Yan (`siyuan.yan@monash.edu`),
+## Cite our Paper
+```bibtex
+@misc{yan2025derm1m,
+  title        = {Derm1M: A Million‑Scale Vision‑Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology},
+  author       = {Siyuan Yan and Ming Hu and Yiwen Jiang and Xieji Li and Hao Fei and Philipp Tschandl and Harald Kittler and Zongyuan Ge},
+  year         = {2025},
+  eprint       = {2503.14911},
+  archivePrefix= {arXiv},
+  primaryClass = {cs.CV},
+  url          = {https://arxiv.org/abs/2503.14911}
+}
+@article{yan2025multimodal,
+  title={A multimodal vision foundation model for clinical dermatology},
+  author={Yan, Siyuan and Yu, Zhen and Primiero, Clare and Vico-Alonso, Cristina and Wang, Zhonghua and Yang, Litao and Tschandl, Philipp and Hu, Ming and Ju, Lie and Tan, Gin and others},
+  journal={Nature Medicine},
+  pages={1--12},
+  year={2025},
+  publisher={Nature Publishing Group}
+}
+```