moxeeeem commited on
Commit
ba51cc5
·
verified ·
1 Parent(s): 81bbda4

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-to-text
4
+ - image-captioning
5
+ - CLIP
6
+ - GPT-2
7
+ - dermatology
8
+ - dermlip
9
+ library_name: transformers
10
+ license: other
11
+ language:
12
+ - en
13
+ pipeline_tag: image-to-text
14
+ ---
15
+
16
+ # DermLIP + GPT-2 Dermatology Captioner
17
+
18
+ A dermatology image captioning model combining DermLIP vision encoder with gpt2-medium language model. Trained on dermatological images for generating clinical descriptions of skin lesions.
19
+
20
+ **Architecture**: DermLIP (ViT-B/16) → learnable prefix → GPT-2 (`gpt2-medium`).
21
+ Trained in two stages: Stage A (META) for generalization and Stage B (SkinCAP) for style/terminology.
22
+
23
+
24
+ ## Metrics
25
+ **Stage A (META)**
26
+ val_loss=1.1070 • PPL=3.03
27
+ BLEU=38.6 • ROUGE-L=0.550 • CIDEr-D=0.17 • CLIP=24.4 • BERT_F1=0.565
28
+
29
+ **Stage B (SKINCAP)**
30
+ val_loss=1.1903 • PPL=3.29
31
+ BLEU=10.0 • ROUGE-L=0.278 • CIDEr-D=0.13 • CLIP=25.9 • BERT_F1=0.363
32
+
33
+ ## Inference
34
+
35
+ > Minimal example uses `inference_min.py` included in this repo.
36
+ > Requires: `pip install torch transformers open_clip_torch pillow huggingface_hub`
37
+
38
+ ```python
39
+ from huggingface_hub import snapshot_download
40
+ from inference_min import load_model, generate
41
+
42
+ # 1) download repo snapshot
43
+ repo_dir = snapshot_download("moxeeeem/dermlip-gpt2-captioner", allow_patterns=["*.pt","*.json","inference_min.py"])
44
+
45
+ # 2) load model from saved config/weights
46
+ model = load_model(repo_dir) # builds CLIP backend + GPT-2 + prefix projector
47
+
48
+ # 3) run generation
49
+ img_paths = ["/path/to/derma_image.jpg"] # local test images
50
+ caps = generate(model, img_paths, prompt="Describe the skin lesion concisely (morphology, color, scale, border, location) in one sentence.Conclude with the most likely diagnosis (1\u20133 words).")
51
+ for c in caps:
52
+ print(c)
53
+ ```
54
+
55
+
56
+ ## Files
57
+ | File | Size | Check |
58
+ |---|---:|---|
59
+ | `best_stageA.pt` | 2 GB | sha256[:12]=3219636f48b0 |
60
+ | `best_stageB.pt` | 2 GB | sha256[:12]=69bded2dcad1 |
61
+ | `final_captioner_gpt2-medium_VisionTransformer.json` | 849 B | sha256[:12]=e157402c9fe2 |
62
+ | `final_captioner_gpt2-medium_VisionTransformer.pt` | 2 GB | sha256[:12]=536ae07811c9 |
63
+ | `loss_dermlip_vitb16.png` | 110 KB | sha256[:12]=a04b1e5832d9 |
64
+
65
+ ## Details
66
+
67
+ - **Vision Encoder**: DermLIP (ViT-B/16)
68
+ - **Language Model**: GPT-2 (`gpt2-medium`)
69
+ - **CLIP weights**: `hf-hub:redlessone/DermLIP_ViT-B-16`
70
+ - **Prefix tokens**: 32
71
+ - **Training prompt**: `Describe the skin lesion concisely (morphology, color, scale, border, location) in one sentence.Conclude with the most likely diagnosis (1–3 words).`
72
+
73
+ ### Model Type Detection
74
+ - Detected as: `dermlip`
75
+ - Repository: `moxeeeem/dermlip-gpt2-captioner`
76
+
77
+ _Auto-generated on 2025-08-30 09:25 UTC._