ConvNeXt Dual-Modal Skin Lesion Classifier (ISIC 2025 / MILK10k)

Research prototype — not validated for clinical use. This model is released for reproducibility and research purposes only. It must not be used to guide clinical decisions, patient triage, or any diagnostic process. See Limitations and Out of Scope.

Model Description

A dual-input ConvNeXt-Base architecture trained end-to-end on the MILK10k dataset (ISIC 2025 Challenge). The model processes a dermoscopic image and a clinical close-up photograph of the same lesion simultaneously, fusing feature representations before classification. It was developed as a research component submitted to the MedGemma Impact Challenge.

Property	Value
Architecture	Dual ConvNeXt-Base, shared-weight encoders, late fusion
Input	Paired dermoscopic + clinical images (384×384 px each)
Output	Softmax probabilities over 11 ISIC diagnostic classes
Training	5-fold stratified cross-validation, macro F1 optimisation
Ensemble	5 models (one per fold), predictions averaged at inference

Intended Use

This model is released strictly for non-commercial research and educational purposes, as part of the SkinAI application submitted to the MedGemma Impact Challenge. It is provided to support reproducibility of the challenge submission and to enable further research into multi-modal skin lesion classification.

Intended users: Researchers and developers working on dermatology AI, machine learning in medical imaging, or related computational fields.

Out-of-Scope Uses

The following uses are explicitly out of scope and are not supported:

Clinical diagnosis or decision support — the model has not been validated for clinical deployment and must not influence patient care in any setting.
Patient triage or screening — performance has only been evaluated on held-out folds of the MILK10k training distribution; generalisability to other populations, imaging devices, or clinical workflows is unknown.
Autonomous or semi-autonomous medical decision making — any application in which model outputs could directly or indirectly affect patient management.
Deployment without independent clinical validation — any production use would require prospective validation by qualified clinicians under appropriate regulatory oversight.

The performance metrics reported below reflect internal cross-validation on a single dataset and are not sufficient evidence of clinical utility.

Diagnostic Classes

Class	Description
AKIEC	Actinic keratosis / intraepithelial carcinoma
BCC	Basal cell carcinoma
BEN_OTH	Other benign lesion
BKL	Benign keratosis
DF	Dermatofibroma
INF	Inflammatory / infectious
MAL_OTH	Other malignant lesion
MEL	Melanoma
NV	Melanocytic nevus
SCCKA	Squamous cell carcinoma / keratoacanthoma
VASC	Vascular lesion

Performance

Important caveat: All metrics below are from held-out validation folds of the MILK10k training dataset using 5-fold stratified cross-validation. They represent performance under distribution-matched conditions and should not be interpreted as estimates of real-world clinical performance. External validation has not been performed.

Aggregate Metrics

Metric	Value
Balanced Multiclass Accuracy	0.665
Macro F1 (ConvNeXt alone)	0.555
Macro F1 (MedSigLIP + ConvNeXt ensemble)	0.591
ISIC 2025 Leaderboard Score (Dice)	0.538

Per-Class Metrics (Validation, Single ConvNeXt Fold)

Class	AUC	AUC (Sens>80%)	Avg Precision	Sensitivity	Specificity	Dice	PPV	NPV
AKIEC	0.933	0.873	0.704	0.732	0.924	0.675	0.627	0.952
BCC	0.975	0.960	0.838	0.951	0.919	0.758	0.630	0.992
BEN_OTH	0.978	0.953	0.505	0.429	0.998	0.545	0.750	0.992
BKL	0.881	0.713	0.746	0.750	0.865	0.664	0.595	0.929
DF	0.986	0.983	0.536	0.833	0.992	0.667	0.556	0.998
INF	0.841	0.722	0.164	0.364	0.985	0.364	0.364	0.985
MAL_OTH	0.820	0.717	0.518	0.400	0.993	0.571	1.000	0.987
MEL	0.957	0.935	0.820	0.821	0.950	0.688	0.593	0.984
NV	0.960	0.948	0.845	0.865	0.963	0.796	0.738	0.983
SCCKA	0.949	0.911	0.857	0.863	0.903	0.798	0.743	0.953
VASC	0.993	0.991	0.614	0.800	0.994	0.667	0.571	0.998
Mean	0.934	0.883	0.650	0.710	0.954	0.654	0.651	0.978

Rare classes (INF: ~11 lesions, MAL_OTH: ~15 lesions, VASC: ~15 lesions) are severely underrepresented in MILK10k. Sensitivity figures for these classes should be interpreted with caution given the small sample sizes involved.

Usage

This code is provided for research reproducibility. Users are responsible for ensuring any application complies with applicable laws and ethical guidelines.

import torch
import torch.nn.functional as F
import timm
import torch.nn as nn
from PIL import Image
import torchvision.transforms as transforms
from huggingface_hub import hf_hub_download


# --- Model Definition ---

class DualConvNeXt(nn.Module):
    def __init__(self, num_classes=11, model_name='convnext_base'):
        super().__init__()
        self.clinical_encoder = timm.create_model(
            model_name, pretrained=False, num_classes=0
        )
        self.derm_encoder = timm.create_model(
            model_name, pretrained=False, num_classes=0
        )
        feat_dim = self.clinical_encoder.num_features
        self.classifier = nn.Sequential(
            nn.Linear(feat_dim * 2, 512),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(512, num_classes)
        )

    def forward(self, clinical, derm):
        c = self.clinical_encoder(clinical)
        d = self.derm_encoder(derm)
        return self.classifier(torch.cat([c, d], dim=1))


# --- Load Model ---

CLASS_NAMES = ['AKIEC', 'BCC', 'BEN_OTH', 'BKL', 'DF',
               'INF', 'MAL_OTH', 'MEL', 'NV', 'SCCKA', 'VASC']

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = DualConvNeXt(num_classes=11)

weights_path = hf_hub_download(
    repo_id="tech-doc/ConvNeXt_Milk10k",
    filename="convnext_fold0_best.pth"
)
checkpoint = torch.load(weights_path, map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval().to(device)


# --- Preprocessing ---

transform = transforms.Compose([
    transforms.Resize((384, 384)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])


# --- Inference ---

def predict(clinical_image_path: str, derm_image_path: str) -> dict:
    """
    Research inference only. Output must not be used for clinical decisions.

    Args:
        clinical_image_path: Path to clinical close-up photograph
        derm_image_path: Path to dermoscopic image

    Returns:
        dict with 'prediction', 'confidence', and 'probabilities'
    """
    clinical = transform(Image.open(clinical_image_path).convert('RGB')).unsqueeze(0).to(device)
    derm = transform(Image.open(derm_image_path).convert('RGB')).unsqueeze(0).to(device)

    with torch.no_grad():
        logits = model(clinical, derm)
        probs = F.softmax(logits, dim=1).squeeze().cpu().numpy()

    return {
        'prediction': CLASS_NAMES[probs.argmax()],
        'confidence': float(probs.max()),
        'probabilities': {c: float(p) for c, p in zip(CLASS_NAMES, probs)}
    }


# Example
result = predict('clinical.jpg', 'dermoscopy.jpg')
print(f"Prediction: {result['prediction']} ({result['confidence']:.1%})")

Training Details

Parameter	Value
Base model	`convnext_base` (ImageNet-22k pretrained via `timm`)
Image size	384×384 px
Batch size	32
Optimiser	AdamW, lr=1e-4
Scheduler	Cosine annealing with warm restarts
Loss	Cross-entropy with class weights + focal loss
Augmentation	Random flips, rotations, colour jitter, RandAugment
Folds	5-fold stratified CV (seed 42)
Hardware	NVIDIA A100 (Google Colab)
Training time	~4–6 hours per fold

Limitations

Single-dataset evaluation: Trained and evaluated exclusively on MILK10k (~5,240 lesions). No external validation has been performed. Reported metrics should not be generalised beyond this distribution.
Severe class imbalance: Rare classes (INF: ~11 lesions, MAL_OTH: ~15 lesions, VASC: ~15 lesions) are underrepresented. Performance on these classes is highly uncertain and may not be reproducible on different samples.
Paired-image requirement: The model requires simultaneous dermoscopic and clinical photographs of the same lesion. Single-image inference is architecturally unsupported and was not evaluated.
Skin tone representation: The MILK10k dataset composition with respect to Fitzpatrick phototype has not been fully characterised. Performance across darker skin tones (Fitzpatrick IV–VI) has not been validated.
Paediatric populations: The model was not evaluated on paediatric patients.
Device variability: Performance may degrade with imaging devices, magnifications, or lighting conditions not represented in the training data.
No prospective validation: All reported metrics are from retrospective cross-validation. Prospective clinical validation would be required before any consideration of real-world use.

Citation

If you use this model or the MILK10k dataset in your research, please cite:

@dataset{milk10k2025,
  author    = {MILK study team},
  title     = {MILK10k},
  year      = {2025},
  publisher = {ISIC Archive},
  doi       = {10.34970/648456}
}

License

CC BY-NC 4.0 — This model was trained on MILK10k data (CC-BY-NC licensed). Non-commercial research use only. Any commercial application is prohibited without explicit permission.

Downloads last month: -; Downloads are not tracked for this model. How to track