Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,77 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- image-to-text
|
| 4 |
+
- image-captioning
|
| 5 |
+
- CLIP
|
| 6 |
+
- GPT-2
|
| 7 |
+
- dermatology
|
| 8 |
+
- dermlip
|
| 9 |
+
library_name: transformers
|
| 10 |
+
license: other
|
| 11 |
+
language:
|
| 12 |
+
- en
|
| 13 |
+
pipeline_tag: image-to-text
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# DermLIP + GPT-2 Dermatology Captioner
|
| 17 |
+
|
| 18 |
+
A dermatology image captioning model combining DermLIP vision encoder with gpt2-medium language model. Trained on dermatological images for generating clinical descriptions of skin lesions.
|
| 19 |
+
|
| 20 |
+
**Architecture**: DermLIP (ViT-B/16) → learnable prefix → GPT-2 (`gpt2-medium`).
|
| 21 |
+
Trained in two stages: Stage A (META) for generalization and Stage B (SkinCAP) for style/terminology.
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
## Metrics
|
| 25 |
+
**Stage A (META)**
|
| 26 |
+
val_loss=1.1070 • PPL=3.03
|
| 27 |
+
BLEU=38.6 • ROUGE-L=0.550 • CIDEr-D=0.17 • CLIP=24.4 • BERT_F1=0.565
|
| 28 |
+
|
| 29 |
+
**Stage B (SKINCAP)**
|
| 30 |
+
val_loss=1.1903 • PPL=3.29
|
| 31 |
+
BLEU=10.0 • ROUGE-L=0.278 • CIDEr-D=0.13 • CLIP=25.9 • BERT_F1=0.363
|
| 32 |
+
|
| 33 |
+
## Inference
|
| 34 |
+
|
| 35 |
+
> Minimal example uses `inference_min.py` included in this repo.
|
| 36 |
+
> Requires: `pip install torch transformers open_clip_torch pillow huggingface_hub`
|
| 37 |
+
|
| 38 |
+
```python
|
| 39 |
+
from huggingface_hub import snapshot_download
|
| 40 |
+
from inference_min import load_model, generate
|
| 41 |
+
|
| 42 |
+
# 1) download repo snapshot
|
| 43 |
+
repo_dir = snapshot_download("moxeeeem/dermlip-gpt2-captioner", allow_patterns=["*.pt","*.json","inference_min.py"])
|
| 44 |
+
|
| 45 |
+
# 2) load model from saved config/weights
|
| 46 |
+
model = load_model(repo_dir) # builds CLIP backend + GPT-2 + prefix projector
|
| 47 |
+
|
| 48 |
+
# 3) run generation
|
| 49 |
+
img_paths = ["/path/to/derma_image.jpg"] # local test images
|
| 50 |
+
caps = generate(model, img_paths, prompt="Describe the skin lesion concisely (morphology, color, scale, border, location) in one sentence.Conclude with the most likely diagnosis (1\u20133 words).")
|
| 51 |
+
for c in caps:
|
| 52 |
+
print(c)
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
## Files
|
| 57 |
+
| File | Size | Check |
|
| 58 |
+
|---|---:|---|
|
| 59 |
+
| `best_stageA.pt` | 2 GB | sha256[:12]=3219636f48b0 |
|
| 60 |
+
| `best_stageB.pt` | 2 GB | sha256[:12]=69bded2dcad1 |
|
| 61 |
+
| `final_captioner_gpt2-medium_VisionTransformer.json` | 849 B | sha256[:12]=e157402c9fe2 |
|
| 62 |
+
| `final_captioner_gpt2-medium_VisionTransformer.pt` | 2 GB | sha256[:12]=536ae07811c9 |
|
| 63 |
+
| `loss_dermlip_vitb16.png` | 110 KB | sha256[:12]=a04b1e5832d9 |
|
| 64 |
+
|
| 65 |
+
## Details
|
| 66 |
+
|
| 67 |
+
- **Vision Encoder**: DermLIP (ViT-B/16)
|
| 68 |
+
- **Language Model**: GPT-2 (`gpt2-medium`)
|
| 69 |
+
- **CLIP weights**: `hf-hub:redlessone/DermLIP_ViT-B-16`
|
| 70 |
+
- **Prefix tokens**: 32
|
| 71 |
+
- **Training prompt**: `Describe the skin lesion concisely (morphology, color, scale, border, location) in one sentence.Conclude with the most likely diagnosis (1–3 words).`
|
| 72 |
+
|
| 73 |
+
### Model Type Detection
|
| 74 |
+
- Detected as: `dermlip`
|
| 75 |
+
- Repository: `moxeeeem/dermlip-gpt2-captioner`
|
| 76 |
+
|
| 77 |
+
_Auto-generated on 2025-08-30 09:25 UTC._
|