EVA-02 Small — Melanoma / Skin Lesion Classifier

Checkpoint: model_0001.pt · Author: Fabian Wolz · Date: March 2026

1. Introduction

This model performs binary classification of dermoscopic skin lesion images — malignant vs. benign — trained on a curated multi-source ISIC dataset. It is intended as a research tool for early-stage screening assistance and to support AI research in dermatology.

⚠️ This model is not validated for clinical use and must not replace a qualified dermatologist.

The classifier is built on EVA-02 Small, a vision transformer pre-trained with Masked Image Modeling on ImageNet-22K. The model was fine-tuned end-to-end on labelled dermoscopy images with layer-wise learning rate decay (LLRD), stochastic depth regularisation, and Exponential Moving Average (EMA) weight smoothing.

2. Model Overview

Parameter	Value
Architecture	EVA-02 Small (Vision Transformer)
Checkpoint ID	`eva02_small_patch14_336.mim_in22k_ft_in1k`
Pre-training	Masked Image Modeling on ImageNet-22K (~14M images)
Patch size	14 × 14 px · Input resolution: 336 × 336 px
Position encoding	Rotary Position Embeddings (RoPE)
Activation	SwiGLU
Pooling	Mean pooling of patch tokens
Classification head	Linear layer (binary output)
Drop path rate	0.1 (stochastic depth regularisation)

3. Dataset

3.1 Training, Validation and Test Sets

HAM10000 — Human Against Machine with 10000 training images (ISIC)
BCN20000 — Barcelona dermoscopy collection
ISIC 2018, ISIC 2019 — International Skin Imaging Collaboration challenge datasets

All images with ambiguous or missing malignancy labels were removed. Only binary labels (malignant / benign) were retained. Images are sourced from the ISIC Archive under CC-BY-NC 4.0.

Split	Details
Training	Stratified by label and source dataset
Validation	Hold-out set used for model selection (AUROC-based)
Test	Final evaluation · 6,384 images · 1,305 positive / 5,079 negative

3.2 Preprocessing

U-Net segmentation: applied to images with significant non-lesion background
Resize to 336 × 336 px with ImageNet-standard normalisation
GPU-accelerated augmentation pipeline during training

4. Training Configuration

Parameter	Value
Optimizer	AdamW with layer-wise learning rate decay (LLRD)
LR schedule	10-epoch linear warmup → CosineAnnealingLR
Loss function	Weighted binary cross-entropy (class imbalance correction)
Epochs	30
Mixed precision	AMP float16
EMA	Exponential Moving Average (used for validation and model selection)
TTA	Test-Time Augmentation: 4 transforms
Hardware	NVIDIA RTX 5070 Ti

5. Evaluation Results

Test set: 6,384 images · 1,305 malignant · 5,079 benign
Validation AUROC (epoch 30): ~0.9795

Threshold Operating Points

Metric	Crossover (0.860)	Youden's J (0.770)	95% Sensitivity (0.640)	97% Sensitivity (0.430)	99% Sensitivity (0.300)	80% Specificity (0.395)
Accuracy (%)	91.95	91.31	89.80	85.51	66.54	83.57
Sensitivity (%)	91.80	93.72	95.02	97.01	99.00	97.55
Specificity (%)	91.99	90.69	88.46	82.56	58.20	79.98
F1 Score (%)	82.34	81.51	79.21	73.24	54.75	70.82
PPV (%)	74.64	72.11	67.91	58.83	37.83	55.59
NPV (%)	97.76	98.25	98.57	99.08	99.56	99.22
TP	1198	1223	1240	1266	1292	1273
TN	4672	4606	4493	4193	2956	4062
FP	407	473	586	886	2123	1017
FN	107	82	65	39	13	32

Clinical Operating Points — Interpretation

Threshold	Use Case
0.300 (99% sensitivity)	Population screening — minimise missed cancers
0.430 (97% sensitivity)	Default — recommended general screening
0.640 (95% sensitivity)	Balanced screening with higher specificity
0.770 (Youden's J)	Maximises sensitivity + specificity jointly
0.860 (Crossover)	Sensitivity ≈ Specificity ≈ 91.9%

6. How to Use

Installation

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install timm pillow numpy

Download the model

from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(repo_id="fawo/eva02-small-melanoma-classifier", filename="model_0001.pt")

Inference

import torch
import torch.nn as nn
import timm
from timm.data import resolve_data_config
from timm.data.transforms_factory import create_transform
from PIL import Image

MODEL_NAME = "eva02_small_patch14_336.mim_in22k_ft_in1k"

class ISICModel(nn.Module):
    def __init__(self, model_name):
        super().__init__()
        self.model = timm.create_model(model_name, pretrained=False, drop_path_rate=0.1)
        self.model.head = nn.Linear(self.model.head.in_features, 1)
    def forward(self, x):
        return self.model(x)

# Load model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ISICModel(MODEL_NAME)
ckpt = torch.load(ckpt_path, map_location=device)
model.load_state_dict(ckpt["model_state_dict"])
model.to(device).eval()

# Build transform from model config
transform = create_transform(**resolve_data_config({}, model=model.model), is_training=False)

# Run inference
img = transform(Image.open("lesion.jpg").convert("RGB")).unsqueeze(0).to(device)
with torch.no_grad():
    prob = torch.sigmoid(model(img)).item()

# Apply threshold (default: 0.430 = 97% sensitivity)
label = "MALIGNANT" if prob >= 0.430 else "benign"
print(f"Probability: {prob:.4f} → {label}")

Standalone inference script

A ready-to-run predict.py with folder batch mode, CSV output, and all threshold options is available in the GitHub repository:
👉 github.com/FaGit99/melanoma-classifier-eva02

7. Intended Use and Limitations

Intended Use

Research and development in AI-assisted dermatology
Prototype screening tool — requires clinical validation before any patient-facing deployment
Benchmark baseline for EVA-02-based dermoscopy classifiers

Known Limitations

Not for clinical diagnosis. Must not replace a qualified dermatologist.
Trained predominantly on lighter skin tone images (HAM10000, ISIC). Performance on darker skin tones is not validated and likely degraded.
Spurious correlations detected via GradCAM analysis: vignette borders, ink markers, and hair artifacts can influence predictions.
Epoch 30 of the training run — edge of overfitting.
Domain shift expected on images captured outside dermoscopy conditions.

8. License

CC-BY-NC 4.0 — Non-commercial use only.

This restriction is inherited from the upstream training datasets (HAM10000, BCN20000, ISIC 2018/2019), all of which are licensed CC-BY-NC 4.0. Commercial use requires separate licensing of all source datasets.

9. Citation

@misc{wolz2026melanoma,
  title   = {EVA-02 Small Melanoma Classifier},
  author  = {Wolz, Fabian},
  year    = {2026},
  url     = {https://huggingface.co/fawo/eva02-small-melanoma-classifier},
  note    = {Checkpoint model\_0001, validation AUROC 0.9795}
}

10. Acknowledgements

This work was conducted by Fabian Wolz (github.com/FaGit99) as an independent research project. Machine learning strategy guidance and algorithm implementation support were provided by Claude (Anthropic). The intellectual direction, experimental design, clinical framing, and all scientific judgements are the author's own.

Model architecture and pretrained weights provided via the timm library (Wightman, R., 2019, github.com/huggingface/pytorch-image-models). Training infrastructure relies on PyTorch (Paszke et al., 2019) and torcheval.

Training data sourced from the ISIC Archive: HAM10000 (Tschandl et al., 2018), BCN20000 (Combalia et al., 2019), and the ISIC 2018 and 2019 challenge datasets. The authors of these datasets are gratefully acknowledged for making their work publicly available to the research community.

Downloads last month: -

fawo
/

eva02-small-melanoma-classifier