EVA-02 Small β€” Melanoma / Skin Lesion Classifier

Checkpoint: model_0001.pt Β· Author: Fabian Wolz Β· Date: March 2026


1. Introduction

This model performs binary classification of dermoscopic skin lesion images β€” malignant vs. benign β€” trained on a curated multi-source ISIC dataset. It is intended as a research tool for early-stage screening assistance and to support AI research in dermatology.

⚠️ This model is not validated for clinical use and must not replace a qualified dermatologist.

The classifier is built on EVA-02 Small, a vision transformer pre-trained with Masked Image Modeling on ImageNet-22K. The model was fine-tuned end-to-end on labelled dermoscopy images with layer-wise learning rate decay (LLRD), stochastic depth regularisation, and Exponential Moving Average (EMA) weight smoothing.


2. Model Overview

Parameter Value
Architecture EVA-02 Small (Vision Transformer)
Checkpoint ID eva02_small_patch14_336.mim_in22k_ft_in1k
Pre-training Masked Image Modeling on ImageNet-22K (~14M images)
Patch size 14 Γ— 14 px Β· Input resolution: 336 Γ— 336 px
Position encoding Rotary Position Embeddings (RoPE)
Activation SwiGLU
Pooling Mean pooling of patch tokens
Classification head Linear layer (binary output)
Drop path rate 0.1 (stochastic depth regularisation)

3. Dataset

3.1 Training, Validation and Test Sets

  • HAM10000 β€” Human Against Machine with 10000 training images (ISIC)
  • BCN20000 β€” Barcelona dermoscopy collection
  • ISIC 2018, ISIC 2019 β€” International Skin Imaging Collaboration challenge datasets

All images with ambiguous or missing malignancy labels were removed. Only binary labels (malignant / benign) were retained. Images are sourced from the ISIC Archive under CC-BY-NC 4.0.

Split Details
Training Stratified by label and source dataset
Validation Hold-out set used for model selection (AUROC-based)
Test Final evaluation Β· 6,384 images Β· 1,305 positive / 5,079 negative

3.2 Preprocessing

  • U-Net segmentation: applied to images with significant non-lesion background
  • Resize to 336 Γ— 336 px with ImageNet-standard normalisation
  • GPU-accelerated augmentation pipeline during training

4. Training Configuration

Parameter Value
Optimizer AdamW with layer-wise learning rate decay (LLRD)
LR schedule 10-epoch linear warmup β†’ CosineAnnealingLR
Loss function Weighted binary cross-entropy (class imbalance correction)
Epochs 30
Mixed precision AMP float16
EMA Exponential Moving Average (used for validation and model selection)
TTA Test-Time Augmentation: 4 transforms
Hardware NVIDIA RTX 5070 Ti

5. Evaluation Results

Test set: 6,384 images Β· 1,305 malignant Β· 5,079 benign
Validation AUROC (epoch 30): ~0.9795

Threshold Operating Points

Metric Crossover (0.860) Youden's J (0.770) 95% Sensitivity (0.640) 97% Sensitivity (0.430) 99% Sensitivity (0.300) 80% Specificity (0.395)
Accuracy (%) 91.95 91.31 89.80 85.51 66.54 83.57
Sensitivity (%) 91.80 93.72 95.02 97.01 99.00 97.55
Specificity (%) 91.99 90.69 88.46 82.56 58.20 79.98
F1 Score (%) 82.34 81.51 79.21 73.24 54.75 70.82
PPV (%) 74.64 72.11 67.91 58.83 37.83 55.59
NPV (%) 97.76 98.25 98.57 99.08 99.56 99.22
TP 1198 1223 1240 1266 1292 1273
TN 4672 4606 4493 4193 2956 4062
FP 407 473 586 886 2123 1017
FN 107 82 65 39 13 32

Clinical Operating Points β€” Interpretation

Threshold Use Case
0.300 (99% sensitivity) Population screening β€” minimise missed cancers
0.430 (97% sensitivity) Default β€” recommended general screening
0.640 (95% sensitivity) Balanced screening with higher specificity
0.770 (Youden's J) Maximises sensitivity + specificity jointly
0.860 (Crossover) Sensitivity β‰ˆ Specificity β‰ˆ 91.9%

6. How to Use

Installation

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install timm pillow numpy

Download the model

from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(repo_id="fawo/eva02-small-melanoma-classifier", filename="model_0001.pt")

Inference

import torch
import torch.nn as nn
import timm
from timm.data import resolve_data_config
from timm.data.transforms_factory import create_transform
from PIL import Image

MODEL_NAME = "eva02_small_patch14_336.mim_in22k_ft_in1k"

class ISICModel(nn.Module):
    def __init__(self, model_name):
        super().__init__()
        self.model = timm.create_model(model_name, pretrained=False, drop_path_rate=0.1)
        self.model.head = nn.Linear(self.model.head.in_features, 1)
    def forward(self, x):
        return self.model(x)

# Load model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ISICModel(MODEL_NAME)
ckpt = torch.load(ckpt_path, map_location=device)
model.load_state_dict(ckpt["model_state_dict"])
model.to(device).eval()

# Build transform from model config
transform = create_transform(**resolve_data_config({}, model=model.model), is_training=False)

# Run inference
img = transform(Image.open("lesion.jpg").convert("RGB")).unsqueeze(0).to(device)
with torch.no_grad():
    prob = torch.sigmoid(model(img)).item()

# Apply threshold (default: 0.430 = 97% sensitivity)
label = "MALIGNANT" if prob >= 0.430 else "benign"
print(f"Probability: {prob:.4f} β†’ {label}")

Standalone inference script

A ready-to-run predict.py with folder batch mode, CSV output, and all threshold options is available in the GitHub repository:
πŸ‘‰ github.com/FaGit99/melanoma-classifier-eva02


7. Intended Use and Limitations

Intended Use

  • Research and development in AI-assisted dermatology
  • Prototype screening tool β€” requires clinical validation before any patient-facing deployment
  • Benchmark baseline for EVA-02-based dermoscopy classifiers

Known Limitations

  • Not for clinical diagnosis. Must not replace a qualified dermatologist.
  • Trained predominantly on lighter skin tone images (HAM10000, ISIC). Performance on darker skin tones is not validated and likely degraded.
  • Spurious correlations detected via GradCAM analysis: vignette borders, ink markers, and hair artifacts can influence predictions.
  • Epoch 30 of the training run β€” edge of overfitting.
  • Domain shift expected on images captured outside dermoscopy conditions.

8. License

CC-BY-NC 4.0 β€” Non-commercial use only.

This restriction is inherited from the upstream training datasets (HAM10000, BCN20000, ISIC 2018/2019), all of which are licensed CC-BY-NC 4.0. Commercial use requires separate licensing of all source datasets.


9. Citation

@misc{wolz2026melanoma,
  title   = {EVA-02 Small Melanoma Classifier},
  author  = {Wolz, Fabian},
  year    = {2026},
  url     = {https://huggingface.co/fawo/eva02-small-melanoma-classifier},
  note    = {Checkpoint model\_0001, validation AUROC 0.9795}
}

10. Acknowledgements

This work was conducted by Fabian Wolz (github.com/FaGit99) as an independent research project. Machine learning strategy guidance and algorithm implementation support were provided by Claude (Anthropic). The intellectual direction, experimental design, clinical framing, and all scientific judgements are the author's own.

Model architecture and pretrained weights provided via the timm library (Wightman, R., 2019, github.com/huggingface/pytorch-image-models). Training infrastructure relies on PyTorch (Paszke et al., 2019) and torcheval.

Training data sourced from the ISIC Archive: HAM10000 (Tschandl et al., 2018), BCN20000 (Combalia et al., 2019), and the ISIC 2018 and 2019 challenge datasets. The authors of these datasets are gratefully acknowledged for making their work publicly available to the research community.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train fawo/eva02-small-melanoma-classifier