fawo
/

eva02-small-melanoma-classifier

+---
+license: cc-by-nc-4.0
+tags:
+  - image-classification
+  - medical
+  - dermatology
+  - melanoma
+  - skin-lesion
+  - vision-transformer
+  - eva02
+  - dermoscopy
+language:
+  - en
+datasets:
+  - marmal88/skin_cancer
+library_name: timm
+---
+# EVA-02 Small — Melanoma / Skin Lesion Classifier
+**Checkpoint:** `model_0001.pt` · **Author:** Fabian Wolz · **Date:** March 2026
+---
+## 1. Introduction
+This model performs **binary classification of dermoscopic skin lesion images** — malignant vs. benign — trained on a curated multi-source ISIC dataset. It is intended as a research tool for early-stage screening assistance and to support AI research in dermatology.
+> ⚠️ **This model is not validated for clinical use and must not replace a qualified dermatologist.**
+The classifier is built on **EVA-02 Small**, a vision transformer pre-trained with Masked Image Modeling on ImageNet-22K. The model was fine-tuned end-to-end on labelled dermoscopy images with layer-wise learning rate decay (LLRD), stochastic depth regularisation, and Exponential Moving Average (EMA) weight smoothing.
+---
+## 2. Model Overview
+| Parameter | Value |
+|---|---|
+| Architecture | EVA-02 Small (Vision Transformer) |
+| Checkpoint ID | `eva02_small_patch14_336.mim_in22k_ft_in1k` |
+| Pre-training | Masked Image Modeling on ImageNet-22K (~14M images) |
+| Patch size | 14 × 14 px · Input resolution: 336 × 336 px |
+| Position encoding | Rotary Position Embeddings (RoPE) |
+| Activation | SwiGLU |
+| Pooling | Mean pooling of patch tokens |
+| Classification head | Linear layer (binary output) |
+| Drop path rate | 0.1 (stochastic depth regularisation) |
+---
+## 3. Dataset
+### 3.1 Training, Validation and Test Sets
+- **HAM10000** — Human Against Machine with 10000 training images (ISIC)
+- **BCN20000** — Barcelona dermoscopy collection
+- **ISIC 2018, ISIC 2019** — International Skin Imaging Collaboration challenge datasets
+All images with ambiguous or missing malignancy labels were removed. Only binary labels (malignant / benign) were retained. Images are sourced from the ISIC Archive under **CC-BY-NC 4.0**.
+| Split | Details |
+|---|---|
+| Training | Stratified by label and source dataset |
+| Validation | Hold-out set used for model selection (AUROC-based) |
+| Test | Final evaluation · 6,384 images · 1,305 positive / 5,079 negative |
+### 3.2 Preprocessing
+- U-Net segmentation: applied to images with significant non-lesion background
+- Resize to 336 × 336 px with ImageNet-standard normalisation
+- GPU-accelerated augmentation pipeline during training
+---
+## 4. Training Configuration
+| Parameter | Value |
+|---|---|
+| Optimizer | AdamW with layer-wise learning rate decay (LLRD) |
+| LR schedule | 10-epoch linear warmup → CosineAnnealingLR |
+| Loss function | Weighted binary cross-entropy (class imbalance correction) |
+| Epochs | 30 |
+| Mixed precision | AMP float16 |
+| EMA | Exponential Moving Average (used for validation and model selection) |
+| TTA | Test-Time Augmentation: 4 transforms |
+| Hardware | NVIDIA RTX 5070 Ti |
+---
+## 5. Evaluation Results
+**Test set:** 6,384 images · 1,305 malignant · 5,079 benign
+**Validation AUROC (epoch 30):** ~0.9795
+### Threshold Operating Points
+| Metric | Crossover (0.860) | Youden's J (0.770) | 95% Sensitivity (0.640) | 97% Sensitivity (0.430) | 99% Sensitivity (0.300) | 80% Specificity (0.395) |
+|---|---|---|---|---|---|---|
+| Accuracy (%) | 91.95 | 91.31 | 89.80 | 85.51 | 66.54 | 83.57 |
+| Sensitivity (%) | 91.80 | 93.72 | 95.02 | 97.01 | 99.00 | 97.55 |
+| Specificity (%) | 91.99 | 90.69 | 88.46 | 82.56 | 58.20 | 79.98 |
+| F1 Score (%) | 82.34 | 81.51 | 79.21 | 73.24 | 54.75 | 70.82 |
+| PPV (%) | 74.64 | 72.11 | 67.91 | 58.83 | 37.83 | 55.59 |
+| NPV (%) | 97.76 | 98.25 | 98.57 | 99.08 | 99.56 | 99.22 |
+| TP | 1198 | 1223 | 1240 | 1266 | 1292 | 1273 |
+| TN | 4672 | 4606 | 4493 | 4193 | 2956 | 4062 |
+| FP | 407 | 473 | 586 | 886 | 2123 | 1017 |
+| FN | 107 | 82 | 65 | 39 | 13 | 32 |
+### Clinical Operating Points — Interpretation
+| Threshold | Use Case |
+|---|---|
+| 0.300 (99% sensitivity) | Population screening — minimise missed cancers |
+| 0.430 (97% sensitivity) | **Default — recommended general screening** |
+| 0.640 (95% sensitivity) | Balanced screening with higher specificity |
+| 0.770 (Youden's J) | Maximises sensitivity + specificity jointly |
+| 0.860 (Crossover) | Sensitivity ≈ Specificity ≈ 91.9% |
+---
+## 6. How to Use
+### Installation
+```bash
+pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
+pip install timm pillow numpy
+```
+### Download the model
+```python
+from huggingface_hub import hf_hub_download
+ckpt_path = hf_hub_download(repo_id="fawo/eva02-small-melanoma-classifier", filename="model_0001.pt")
+```
+### Inference
+```python
+import torch
+import torch.nn as nn
+import timm
+from timm.data import resolve_data_config
+from timm.data.transforms_factory import create_transform
+from PIL import Image
+MODEL_NAME = "eva02_small_patch14_336.mim_in22k_ft_in1k"
+class ISICModel(nn.Module):
+    def __init__(self, model_name):
+        super().__init__()
+        self.model = timm.create_model(model_name, pretrained=False, drop_path_rate=0.1)
+        self.model.head = nn.Linear(self.model.head.in_features, 1)
+    def forward(self, x):
+        return self.model(x)
+# Load model
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model = ISICModel(MODEL_NAME)
+ckpt = torch.load(ckpt_path, map_location=device)
+model.load_state_dict(ckpt["model_state_dict"])
+model.to(device).eval()
+# Build transform from model config
+transform = create_transform(**resolve_data_config({}, model=model.model), is_training=False)
+# Run inference
+img = transform(Image.open("lesion.jpg").convert("RGB")).unsqueeze(0).to(device)
+with torch.no_grad():
+    prob = torch.sigmoid(model(img)).item()
+# Apply threshold (default: 0.430 = 97% sensitivity)
+label = "MALIGNANT" if prob >= 0.430 else "benign"
+print(f"Probability: {prob:.4f} → {label}")
+```
+### Standalone inference script
+A ready-to-run `predict.py` with folder batch mode, CSV output, and all threshold options is available in the GitHub repository:
+👉 [github.com/FaGit99/melanoma-classifier-eva02](https://github.com/FaGit99/melanoma-classifier-eva02)
+---
+## 7. Intended Use and Limitations
+### Intended Use
+- Research and development in AI-assisted dermatology
+- Prototype screening tool — requires clinical validation before any patient-facing deployment
+- Benchmark baseline for EVA-02-based dermoscopy classifiers
+### Known Limitations
+- **Not for clinical diagnosis.** Must not replace a qualified dermatologist.
+- Trained predominantly on lighter skin tone images (HAM10000, ISIC). Performance on darker skin tones is not validated and likely degraded.
+- Spurious correlations detected via GradCAM analysis: vignette borders, ink markers, and hair artifacts can influence predictions.
+- Epoch 30 of the training run — edge of overfitting.
+- Domain shift expected on images captured outside dermoscopy conditions.
+---
+## 8. License
+**CC-BY-NC 4.0** — Non-commercial use only.
+This restriction is inherited from the upstream training datasets (HAM10000, BCN20000, ISIC 2018/2019), all of which are licensed CC-BY-NC 4.0. Commercial use requires separate licensing of all source datasets.
+---
+## 9. Citation
+```bibtex
+@misc{wolz2026melanoma,
+  title   = {EVA-02 Small Melanoma Classifier},
+  author  = {Wolz, Fabian},
+  year    = {2026},
+  url     = {https://huggingface.co/fawo/eva02-small-melanoma-classifier},
+  note    = {Checkpoint model\_0001, validation AUROC 0.9795}
+}
+```
+---
+## 10. Acknowledgements
+This work was conducted by **Fabian Wolz** ([github.com/FaGit99](https://github.com/FaGit99)) as an independent research project. Machine learning strategy guidance and algorithm implementation support were provided by **Claude** (Anthropic). The intellectual direction, experimental design, clinical framing, and all scientific judgements are the author's own.
+Model architecture and pretrained weights provided via the **timm** library (Wightman, R., 2019, [github.com/huggingface/pytorch-image-models](https://github.com/huggingface/pytorch-image-models)). Training infrastructure relies on **PyTorch** (Paszke et al., 2019) and **torcheval**.
+Training data sourced from the [ISIC Archive](https://isic-archive.com): HAM10000 (Tschandl et al., 2018), BCN20000 (Combalia et al., 2019), and the ISIC 2018 and 2019 challenge datasets. The authors of these datasets are gratefully acknowledged for making their work publicly available to the research community.