---
license: cc-by-nc-4.0
tags:
  - skin-lesion
  - dermoscopy
  - classification
  - convnext
  - medical-imaging
  - research
datasets:
  - ISIC/MILK10k
metrics:
  - f1
  - auc
language:
  - en
pipeline_tag: image-classification
---

# ConvNeXt Dual-Modal Skin Lesion Classifier (ISIC 2025 / MILK10k)

> **Research prototype — not validated for clinical use.**
> This model is released for reproducibility and research purposes only. It must not be used to guide clinical decisions, patient triage, or any diagnostic process. See [Limitations](#limitations) and [Out of Scope](#out-of-scope-uses).

---

## Model Description

A dual-input ConvNeXt-Base architecture trained end-to-end on the [MILK10k dataset](https://doi.org/10.34970/648456) (ISIC 2025 Challenge). The model processes a dermoscopic image and a clinical close-up photograph of the same lesion simultaneously, fusing feature representations before classification. It was developed as a research component submitted to the MedGemma Impact Challenge.

| Property | Value |
|---|---|
| Architecture | Dual ConvNeXt-Base, shared-weight encoders, late fusion |
| Input | Paired dermoscopic + clinical images (384×384 px each) |
| Output | Softmax probabilities over 11 ISIC diagnostic classes |
| Training | 5-fold stratified cross-validation, macro F1 optimisation |
| Ensemble | 5 models (one per fold), predictions averaged at inference |

---

## Intended Use

This model is released strictly for **non-commercial research and educational purposes**, as part of the SkinAI application submitted to the MedGemma Impact Challenge. It is provided to support reproducibility of the challenge submission and to enable further research into multi-modal skin lesion classification.

**Intended users:** Researchers and developers working on dermatology AI, machine learning in medical imaging, or related computational fields.

---

## Out-of-Scope Uses

The following uses are explicitly out of scope and are **not supported**:

- **Clinical diagnosis or decision support** — the model has not been validated for clinical deployment and must not influence patient care in any setting.
- **Patient triage or screening** — performance has only been evaluated on held-out folds of the MILK10k training distribution; generalisability to other populations, imaging devices, or clinical workflows is unknown.
- **Autonomous or semi-autonomous medical decision making** — any application in which model outputs could directly or indirectly affect patient management.
- **Deployment without independent clinical validation** — any production use would require prospective validation by qualified clinicians under appropriate regulatory oversight.

The performance metrics reported below reflect internal cross-validation on a single dataset and are **not sufficient evidence of clinical utility**.

---

## Diagnostic Classes

| Class | Description |
|---|---|
| AKIEC | Actinic keratosis / intraepithelial carcinoma |
| BCC | Basal cell carcinoma |
| BEN_OTH | Other benign lesion |
| BKL | Benign keratosis |
| DF | Dermatofibroma |
| INF | Inflammatory / infectious |
| MAL_OTH | Other malignant lesion |
| MEL | Melanoma |
| NV | Melanocytic nevus |
| SCCKA | Squamous cell carcinoma / keratoacanthoma |
| VASC | Vascular lesion |

---

## Performance

> **Important caveat:** All metrics below are from held-out validation folds of the MILK10k training dataset using 5-fold stratified cross-validation. They represent performance under distribution-matched conditions and should not be interpreted as estimates of real-world clinical performance. External validation has not been performed.

### Aggregate Metrics

| Metric | Value |
|---|---|
| Balanced Multiclass Accuracy | 0.665 |
| Macro F1 (ConvNeXt alone) | 0.555 |
| Macro F1 (MedSigLIP + ConvNeXt ensemble) | 0.591 |
| ISIC 2025 Leaderboard Score (Dice) | 0.538 |

### Per-Class Metrics (Validation, Single ConvNeXt Fold)

| Class | AUC | AUC (Sens>80%) | Avg Precision | Sensitivity | Specificity | Dice | PPV | NPV |
|---|---|---|---|---|---|---|---|---|
| AKIEC | 0.933 | 0.873 | 0.704 | 0.732 | 0.924 | 0.675 | 0.627 | 0.952 |
| BCC | 0.975 | 0.960 | 0.838 | 0.951 | 0.919 | 0.758 | 0.630 | 0.992 |
| BEN_OTH | 0.978 | 0.953 | 0.505 | 0.429 | 0.998 | 0.545 | 0.750 | 0.992 |
| BKL | 0.881 | 0.713 | 0.746 | 0.750 | 0.865 | 0.664 | 0.595 | 0.929 |
| DF | 0.986 | 0.983 | 0.536 | 0.833 | 0.992 | 0.667 | 0.556 | 0.998 |
| INF | 0.841 | 0.722 | 0.164 | 0.364 | 0.985 | 0.364 | 0.364 | 0.985 |
| MAL_OTH | 0.820 | 0.717 | 0.518 | 0.400 | 0.993 | 0.571 | 1.000 | 0.987 |
| MEL | 0.957 | 0.935 | 0.820 | 0.821 | 0.950 | 0.688 | 0.593 | 0.984 |
| NV | 0.960 | 0.948 | 0.845 | 0.865 | 0.963 | 0.796 | 0.738 | 0.983 |
| SCCKA | 0.949 | 0.911 | 0.857 | 0.863 | 0.903 | 0.798 | 0.743 | 0.953 |
| VASC | 0.993 | 0.991 | 0.614 | 0.800 | 0.994 | 0.667 | 0.571 | 0.998 |
| **Mean** | **0.934** | **0.883** | **0.650** | **0.710** | **0.954** | **0.654** | **0.651** | **0.978** |

> Rare classes (INF: ~11 lesions, MAL_OTH: ~15 lesions, VASC: ~15 lesions) are severely underrepresented in MILK10k. Sensitivity figures for these classes should be interpreted with caution given the small sample sizes involved.

---

## Usage

This code is provided for research reproducibility. Users are responsible for ensuring any application complies with applicable laws and ethical guidelines.

```python
import torch
import torch.nn.functional as F
import timm
import torch.nn as nn
from PIL import Image
import torchvision.transforms as transforms
from huggingface_hub import hf_hub_download


# --- Model Definition ---

class DualConvNeXt(nn.Module):
    def __init__(self, num_classes=11, model_name='convnext_base'):
        super().__init__()
        self.clinical_encoder = timm.create_model(
            model_name, pretrained=False, num_classes=0
        )
        self.derm_encoder = timm.create_model(
            model_name, pretrained=False, num_classes=0
        )
        feat_dim = self.clinical_encoder.num_features
        self.classifier = nn.Sequential(
            nn.Linear(feat_dim * 2, 512),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(512, num_classes)
        )

    def forward(self, clinical, derm):
        c = self.clinical_encoder(clinical)
        d = self.derm_encoder(derm)
        return self.classifier(torch.cat([c, d], dim=1))


# --- Load Model ---

CLASS_NAMES = ['AKIEC', 'BCC', 'BEN_OTH', 'BKL', 'DF',
               'INF', 'MAL_OTH', 'MEL', 'NV', 'SCCKA', 'VASC']

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = DualConvNeXt(num_classes=11)

weights_path = hf_hub_download(
    repo_id="tech-doc/ConvNeXt_Milk10k",
    filename="convnext_fold0_best.pth"
)
checkpoint = torch.load(weights_path, map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval().to(device)


# --- Preprocessing ---

transform = transforms.Compose([
    transforms.Resize((384, 384)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])


# --- Inference ---

def predict(clinical_image_path: str, derm_image_path: str) -> dict:
    """
    Research inference only. Output must not be used for clinical decisions.

    Args:
        clinical_image_path: Path to clinical close-up photograph
        derm_image_path: Path to dermoscopic image

    Returns:
        dict with 'prediction', 'confidence', and 'probabilities'
    """
    clinical = transform(Image.open(clinical_image_path).convert('RGB')).unsqueeze(0).to(device)
    derm = transform(Image.open(derm_image_path).convert('RGB')).unsqueeze(0).to(device)

    with torch.no_grad():
        logits = model(clinical, derm)
        probs = F.softmax(logits, dim=1).squeeze().cpu().numpy()

    return {
        'prediction': CLASS_NAMES[probs.argmax()],
        'confidence': float(probs.max()),
        'probabilities': {c: float(p) for c, p in zip(CLASS_NAMES, probs)}
    }


# Example
result = predict('clinical.jpg', 'dermoscopy.jpg')
print(f"Prediction: {result['prediction']} ({result['confidence']:.1%})")
```

---

## Training Details

| Parameter | Value |
|---|---|
| Base model | `convnext_base` (ImageNet-22k pretrained via `timm`) |
| Image size | 384×384 px |
| Batch size | 32 |
| Optimiser | AdamW, lr=1e-4 |
| Scheduler | Cosine annealing with warm restarts |
| Loss | Cross-entropy with class weights + focal loss |
| Augmentation | Random flips, rotations, colour jitter, RandAugment |
| Folds | 5-fold stratified CV (seed 42) |
| Hardware | NVIDIA A100 (Google Colab) |
| Training time | ~4–6 hours per fold |

---

## Limitations

- **Single-dataset evaluation:** Trained and evaluated exclusively on MILK10k (~5,240 lesions). No external validation has been performed. Reported metrics should not be generalised beyond this distribution.
- **Severe class imbalance:** Rare classes (INF: ~11 lesions, MAL_OTH: ~15 lesions, VASC: ~15 lesions) are underrepresented. Performance on these classes is highly uncertain and may not be reproducible on different samples.
- **Paired-image requirement:** The model requires simultaneous dermoscopic and clinical photographs of the same lesion. Single-image inference is architecturally unsupported and was not evaluated.
- **Skin tone representation:** The MILK10k dataset composition with respect to Fitzpatrick phototype has not been fully characterised. Performance across darker skin tones (Fitzpatrick IV–VI) has not been validated.
- **Paediatric populations:** The model was not evaluated on paediatric patients.
- **Device variability:** Performance may degrade with imaging devices, magnifications, or lighting conditions not represented in the training data.
- **No prospective validation:** All reported metrics are from retrospective cross-validation. Prospective clinical validation would be required before any consideration of real-world use.

---

## Citation

If you use this model or the MILK10k dataset in your research, please cite:

```bibtex
@dataset{milk10k2025,
  author    = {MILK study team},
  title     = {MILK10k},
  year      = {2025},
  publisher = {ISIC Archive},
  doi       = {10.34970/648456}
}
```

---

## License

**CC BY-NC 4.0** — This model was trained on MILK10k data (CC-BY-NC licensed). Non-commercial research use only. Any commercial application is prohibited without explicit permission.