EVA-02 Small β Melanoma / Skin Lesion Classifier
Checkpoint: model_0001.pt Β· Author: Fabian Wolz Β· Date: March 2026
1. Introduction
This model performs binary classification of dermoscopic skin lesion images β malignant vs. benign β trained on a curated multi-source ISIC dataset. It is intended as a research tool for early-stage screening assistance and to support AI research in dermatology.
β οΈ This model is not validated for clinical use and must not replace a qualified dermatologist.
The classifier is built on EVA-02 Small, a vision transformer pre-trained with Masked Image Modeling on ImageNet-22K. The model was fine-tuned end-to-end on labelled dermoscopy images with layer-wise learning rate decay (LLRD), stochastic depth regularisation, and Exponential Moving Average (EMA) weight smoothing.
2. Model Overview
| Parameter | Value |
|---|---|
| Architecture | EVA-02 Small (Vision Transformer) |
| Checkpoint ID | eva02_small_patch14_336.mim_in22k_ft_in1k |
| Pre-training | Masked Image Modeling on ImageNet-22K (~14M images) |
| Patch size | 14 Γ 14 px Β· Input resolution: 336 Γ 336 px |
| Position encoding | Rotary Position Embeddings (RoPE) |
| Activation | SwiGLU |
| Pooling | Mean pooling of patch tokens |
| Classification head | Linear layer (binary output) |
| Drop path rate | 0.1 (stochastic depth regularisation) |
3. Dataset
3.1 Training, Validation and Test Sets
- HAM10000 β Human Against Machine with 10000 training images (ISIC)
- BCN20000 β Barcelona dermoscopy collection
- ISIC 2018, ISIC 2019 β International Skin Imaging Collaboration challenge datasets
All images with ambiguous or missing malignancy labels were removed. Only binary labels (malignant / benign) were retained. Images are sourced from the ISIC Archive under CC-BY-NC 4.0.
| Split | Details |
|---|---|
| Training | Stratified by label and source dataset |
| Validation | Hold-out set used for model selection (AUROC-based) |
| Test | Final evaluation Β· 6,384 images Β· 1,305 positive / 5,079 negative |
3.2 Preprocessing
- U-Net segmentation: applied to images with significant non-lesion background
- Resize to 336 Γ 336 px with ImageNet-standard normalisation
- GPU-accelerated augmentation pipeline during training
4. Training Configuration
| Parameter | Value |
|---|---|
| Optimizer | AdamW with layer-wise learning rate decay (LLRD) |
| LR schedule | 10-epoch linear warmup β CosineAnnealingLR |
| Loss function | Weighted binary cross-entropy (class imbalance correction) |
| Epochs | 30 |
| Mixed precision | AMP float16 |
| EMA | Exponential Moving Average (used for validation and model selection) |
| TTA | Test-Time Augmentation: 4 transforms |
| Hardware | NVIDIA RTX 5070 Ti |
5. Evaluation Results
Test set: 6,384 images Β· 1,305 malignant Β· 5,079 benign
Validation AUROC (epoch 30): ~0.9795
Threshold Operating Points
| Metric | Crossover (0.860) | Youden's J (0.770) | 95% Sensitivity (0.640) | 97% Sensitivity (0.430) | 99% Sensitivity (0.300) | 80% Specificity (0.395) |
|---|---|---|---|---|---|---|
| Accuracy (%) | 91.95 | 91.31 | 89.80 | 85.51 | 66.54 | 83.57 |
| Sensitivity (%) | 91.80 | 93.72 | 95.02 | 97.01 | 99.00 | 97.55 |
| Specificity (%) | 91.99 | 90.69 | 88.46 | 82.56 | 58.20 | 79.98 |
| F1 Score (%) | 82.34 | 81.51 | 79.21 | 73.24 | 54.75 | 70.82 |
| PPV (%) | 74.64 | 72.11 | 67.91 | 58.83 | 37.83 | 55.59 |
| NPV (%) | 97.76 | 98.25 | 98.57 | 99.08 | 99.56 | 99.22 |
| TP | 1198 | 1223 | 1240 | 1266 | 1292 | 1273 |
| TN | 4672 | 4606 | 4493 | 4193 | 2956 | 4062 |
| FP | 407 | 473 | 586 | 886 | 2123 | 1017 |
| FN | 107 | 82 | 65 | 39 | 13 | 32 |
Clinical Operating Points β Interpretation
| Threshold | Use Case |
|---|---|
| 0.300 (99% sensitivity) | Population screening β minimise missed cancers |
| 0.430 (97% sensitivity) | Default β recommended general screening |
| 0.640 (95% sensitivity) | Balanced screening with higher specificity |
| 0.770 (Youden's J) | Maximises sensitivity + specificity jointly |
| 0.860 (Crossover) | Sensitivity β Specificity β 91.9% |
6. How to Use
Installation
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install timm pillow numpy
Download the model
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(repo_id="fawo/eva02-small-melanoma-classifier", filename="model_0001.pt")
Inference
import torch
import torch.nn as nn
import timm
from timm.data import resolve_data_config
from timm.data.transforms_factory import create_transform
from PIL import Image
MODEL_NAME = "eva02_small_patch14_336.mim_in22k_ft_in1k"
class ISICModel(nn.Module):
def __init__(self, model_name):
super().__init__()
self.model = timm.create_model(model_name, pretrained=False, drop_path_rate=0.1)
self.model.head = nn.Linear(self.model.head.in_features, 1)
def forward(self, x):
return self.model(x)
# Load model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ISICModel(MODEL_NAME)
ckpt = torch.load(ckpt_path, map_location=device)
model.load_state_dict(ckpt["model_state_dict"])
model.to(device).eval()
# Build transform from model config
transform = create_transform(**resolve_data_config({}, model=model.model), is_training=False)
# Run inference
img = transform(Image.open("lesion.jpg").convert("RGB")).unsqueeze(0).to(device)
with torch.no_grad():
prob = torch.sigmoid(model(img)).item()
# Apply threshold (default: 0.430 = 97% sensitivity)
label = "MALIGNANT" if prob >= 0.430 else "benign"
print(f"Probability: {prob:.4f} β {label}")
Standalone inference script
A ready-to-run predict.py with folder batch mode, CSV output, and all threshold options is available in the GitHub repository:
π github.com/FaGit99/melanoma-classifier-eva02
7. Intended Use and Limitations
Intended Use
- Research and development in AI-assisted dermatology
- Prototype screening tool β requires clinical validation before any patient-facing deployment
- Benchmark baseline for EVA-02-based dermoscopy classifiers
Known Limitations
- Not for clinical diagnosis. Must not replace a qualified dermatologist.
- Trained predominantly on lighter skin tone images (HAM10000, ISIC). Performance on darker skin tones is not validated and likely degraded.
- Spurious correlations detected via GradCAM analysis: vignette borders, ink markers, and hair artifacts can influence predictions.
- Epoch 30 of the training run β edge of overfitting.
- Domain shift expected on images captured outside dermoscopy conditions.
8. License
CC-BY-NC 4.0 β Non-commercial use only.
This restriction is inherited from the upstream training datasets (HAM10000, BCN20000, ISIC 2018/2019), all of which are licensed CC-BY-NC 4.0. Commercial use requires separate licensing of all source datasets.
9. Citation
@misc{wolz2026melanoma,
title = {EVA-02 Small Melanoma Classifier},
author = {Wolz, Fabian},
year = {2026},
url = {https://huggingface.co/fawo/eva02-small-melanoma-classifier},
note = {Checkpoint model\_0001, validation AUROC 0.9795}
}
10. Acknowledgements
This work was conducted by Fabian Wolz (github.com/FaGit99) as an independent research project. Machine learning strategy guidance and algorithm implementation support were provided by Claude (Anthropic). The intellectual direction, experimental design, clinical framing, and all scientific judgements are the author's own.
Model architecture and pretrained weights provided via the timm library (Wightman, R., 2019, github.com/huggingface/pytorch-image-models). Training infrastructure relies on PyTorch (Paszke et al., 2019) and torcheval.
Training data sourced from the ISIC Archive: HAM10000 (Tschandl et al., 2018), BCN20000 (Combalia et al., 2019), and the ISIC 2018 and 2019 challenge datasets. The authors of these datasets are gratefully acknowledged for making their work publicly available to the research community.
- Downloads last month
- -