File size: 3,829 Bytes
cff8475 b7d4273 cff8475 4db1dce cff8475 4db1dce cff8475 4db1dce cff8475 4db1dce cff8475 4db1dce cff8475 4db1dce cff8475 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | ---
language:
- en
license: cc-by-4.0
tags:
- vision
- image-text-to-text
- medical
- dermatology
- multimodal
- clip
- zero-shot-classification
- image-classification
pipeline_tag: zero-shot-image-classification
library_name: transformers
---
# DermLIP: Dermatology Language-Image Pretraining
## Model Description
**DermLIP** is a vision-language model for dermatology, trained on the **Derm1M** dataset—the largest dermatological image-text corpus to date.
### Model Details
- **Model Type:** Pretrained Vision-Language Model (CLIP-style)
- **Architecture:**
- **Vision encoder**: ViT-B16
- **Text encoder**: GPT2
- **Resolution:** 224×224 pixels
- **Paper:** https://arxiv.org/abs/2503.14911
- **Repository:** https://github.com/SiyuanYan1/Derm1M
- **license:** cc-by-nc-nd-4.0
## Training Details
- **Training data:** 403,563 skin image-text pairs from Derm1M datasets. Images include both dermoscopic and clinical images.
- **Training objective:** image-text contrastive loss
- **Hardware:** 1 x Nvidia H200 (~40GB memory usage)
- **Hours used:** ~5 hours
## Intended Uses
### Primary Use Cases
- Zero-shot classification
- Few-shot learning
- Cross-modal retrieval
- Concept annotation/explanation
## How to Use
### Installation
First, clone the Derm1M repository:
```bash
git clone git@github.com:SiyuanYan1/Derm1M.git
cd Derm1M
···
Then install the package following the instruction in the repository.
### Quick Start
```python
import open_clip
from PIL import Image
import torch
# Load model with huggingface checkpoint
model, _, preprocess = open_clip.create_model_and_transforms(
'hf-hub:redlessone/DermLIP_ViT-B-16'
)
model.eval()
# Initialize tokenizer
tokenizer = open_clip.get_tokenizer('hf-hub:redlessone/DermLIP_ViT-B-16')
# Read example image
image = preprocess(Image.open("your_skin_image.png")).unsqueeze(0)
# Define disease labels (example: PAD dataset classes)
PAD_CLASSNAMES = [
"nevus",
"basal cell carcinoma",
"actinic keratosis",
"seborrheic keratosis",
"squamous cell carcinoma",
"melanoma"
]
# Build text prompts
template = lambda c: f'This is a skin image of {c}'
text = tokenizer([template(c) for c in PAD_CLASSNAMES])
# Inference
with torch.no_grad(), torch.autocast("cuda"):
# Encode image and text
image_features = model.encode_image(image)
text_features = model.encode_text(text)
# Normalize features
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
# Compute similarity
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
# Get prediction
final_prediction = PAD_CLASSNAMES[torch.argmax(text_probs[0])]
print(f'This image is diagnosed as {final_prediction}.')
print("Label probabilities:", text_probs)
```
## Contact
For any additional questions or comments, contact Siyuan Yan (`siyuan.yan@monash.edu`),
## Cite our Paper
```bibtex
@misc{yan2025derm1m,
title = {Derm1M: A Million‑Scale Vision‑Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology},
author = {Siyuan Yan and Ming Hu and Yiwen Jiang and Xieji Li and Hao Fei and Philipp Tschandl and Harald Kittler and Zongyuan Ge},
year = {2025},
eprint = {2503.14911},
archivePrefix= {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2503.14911}
}
@article{yan2025multimodal,
title={A multimodal vision foundation model for clinical dermatology},
author={Yan, Siyuan and Yu, Zhen and Primiero, Clare and Vico-Alonso, Cristina and Wang, Zhonghua and Yang, Litao and Tschandl, Philipp and Hu, Ming and Ju, Lie and Tan, Gin and others},
journal={Nature Medicine},
pages={1--12},
year={2025},
publisher={Nature Publishing Group}
}
``` |