redlessone commited on
Commit
cff8475
·
verified ·
1 Parent(s): 4ddff36

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +146 -3
README.md CHANGED
@@ -1,3 +1,146 @@
1
- ---
2
- license: cc-by-nd-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: cc-by-4.0
5
+ tags:
6
+ - vision
7
+ - image-text-to-text
8
+ - medical
9
+ - dermatology
10
+ - multimodal
11
+ - clip
12
+ - zero-shot-classification
13
+ - image-classification
14
+ pipeline_tag: zero-shot-image-classification
15
+ library_name: transformers
16
+ ---
17
+
18
+ # DermLIP: Dermatology Language-Image Pretraining
19
+
20
+ ## Model Description
21
+
22
+ **DermLIP** is a vision-language model for dermatology, trained on the **Derm1M** dataset—the largest dermatological image-text corpus to date.
23
+
24
+ ### Model Details
25
+
26
+ - **Model Type:** Pretrained Vision-Language Model (CLIP-style)
27
+
28
+ - **Architecture:**
29
+
30
+ - **Vision encoder**: ViT-B16
31
+ - **Text encoder**: GPT2
32
+
33
+ - **Resolution:** 224×224 pixels
34
+
35
+ - **Paper:** https://arxiv.org/abs/2503.14911
36
+
37
+ - **Repository:** https://github.com/SiyuanYan1/Derm1M
38
+
39
+ - **license:** cc-by-nc-nd-4.0
40
+
41
+
42
+ ## Training Details
43
+
44
+ - **Training data:** 403,563 skin image-text pairs from Derm1M datasets. Images include both dermoscopic and clinical images.
45
+ - **Training objective:** image-text contrastive loss
46
+ - **Hardware:** 1 x Nvidia H200
47
+ - **Hours used:** ~21.5 hours
48
+
49
+ ## Intended Uses
50
+
51
+ ### Primary Use Cases
52
+
53
+ - Zero-shot classification
54
+ - Few-shot learning
55
+ - Cross-modal retrieval
56
+ - Concept annotation/explanation
57
+
58
+
59
+ ## How to Use
60
+
61
+ ### Installation
62
+
63
+ First, clone the Derm1M repository:
64
+ ```bash
65
+ git clone git@github.com:SiyuanYan1/Derm1M.git
66
+ cd Derm1M
67
+ ```
68
+
69
+ Then install the package following the instruction in the repository.
70
+
71
+ ### Quick Start
72
+ ```python
73
+ import open_clip
74
+ from PIL import Image
75
+ import torch
76
+
77
+ # Load model with huggingface checkpoint
78
+ model, _, preprocess = open_clip.create_model_and_transforms(
79
+ 'hf-hub:redlessone/DermLIP_PanDerm-base-w-PubMed-256'
80
+ )
81
+ model.eval()
82
+
83
+ # Initialize tokenizer
84
+ tokenizer = open_clip.get_tokenizer('hf-hub:redlessone/DermLIP_PanDerm-base-w-PubMed-256')
85
+
86
+ # Read example image
87
+ image = preprocess(Image.open("your_skin_image.png")).unsqueeze(0)
88
+
89
+ # Define disease labels (example: PAD dataset classes)
90
+ PAD_CLASSNAMES = [
91
+ "nevus",
92
+ "basal cell carcinoma",
93
+ "actinic keratosis",
94
+ "seborrheic keratosis",
95
+ "squamous cell carcinoma",
96
+ "melanoma"
97
+ ]
98
+
99
+ # Build text prompts
100
+ template = lambda c: f'This is a skin image of {c}'
101
+ text = tokenizer([template(c) for c in PAD_CLASSNAMES])
102
+
103
+ # Inference
104
+ with torch.no_grad(), torch.autocast("cuda"):
105
+ # Encode image and text
106
+ image_features = model.encode_image(image)
107
+ text_features = model.encode_text(text)
108
+
109
+ # Normalize features
110
+ image_features /= image_features.norm(dim=-1, keepdim=True)
111
+ text_features /= text_features.norm(dim=-1, keepdim=True)
112
+
113
+ # Compute similarity
114
+ text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
115
+
116
+ # Get prediction
117
+ final_prediction = PAD_CLASSNAMES[torch.argmax(text_probs[0])]
118
+ print(f'This image is diagnosed as {final_prediction}.')
119
+ print("Label probabilities:", text_probs)
120
+ ```
121
+
122
+ ## Contact
123
+
124
+ For any additional questions or comments, contact Siyuan Yan (`siyuan.yan@monash.edu`),
125
+
126
+ ## Cite our Paper
127
+ ```bibtex
128
+ @misc{yan2025derm1m,
129
+ title = {Derm1M: A Million‑Scale Vision‑Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology},
130
+ author = {Siyuan Yan and Ming Hu and Yiwen Jiang and Xieji Li and Hao Fei and Philipp Tschandl and Harald Kittler and Zongyuan Ge},
131
+ year = {2025},
132
+ eprint = {2503.14911},
133
+ archivePrefix= {arXiv},
134
+ primaryClass = {cs.CV},
135
+ url = {https://arxiv.org/abs/2503.14911}
136
+ }
137
+
138
+ @article{yan2025multimodal,
139
+ title={A multimodal vision foundation model for clinical dermatology},
140
+ author={Yan, Siyuan and Yu, Zhen and Primiero, Clare and Vico-Alonso, Cristina and Wang, Zhonghua and Yang, Litao and Tschandl, Philipp and Hu, Ming and Ju, Lie and Tan, Gin and others},
141
+ journal={Nature Medicine},
142
+ pages={1--12},
143
+ year={2025},
144
+ publisher={Nature Publishing Group}
145
+ }
146
+ ```