| | --- |
| | license: cc-by-nc-4.0 |
| | tags: |
| | - skin-lesion |
| | - dermoscopy |
| | - classification |
| | - convnext |
| | - medical-imaging |
| | - research |
| | datasets: |
| | - ISIC/MILK10k |
| | metrics: |
| | - f1 |
| | - auc |
| | language: |
| | - en |
| | pipeline_tag: image-classification |
| | --- |
| | |
| | # ConvNeXt Dual-Modal Skin Lesion Classifier (ISIC 2025 / MILK10k) |
| |
|
| | > **Research prototype — not validated for clinical use.** |
| | > This model is released for reproducibility and research purposes only. It must not be used to guide clinical decisions, patient triage, or any diagnostic process. See [Limitations](#limitations) and [Out of Scope](#out-of-scope-uses). |
| |
|
| | --- |
| |
|
| | ## Model Description |
| |
|
| | A dual-input ConvNeXt-Base architecture trained end-to-end on the [MILK10k dataset](https://doi.org/10.34970/648456) (ISIC 2025 Challenge). The model processes a dermoscopic image and a clinical close-up photograph of the same lesion simultaneously, fusing feature representations before classification. It was developed as a research component submitted to the MedGemma Impact Challenge. |
| |
|
| | | Property | Value | |
| | |---|---| |
| | | Architecture | Dual ConvNeXt-Base, shared-weight encoders, late fusion | |
| | | Input | Paired dermoscopic + clinical images (384×384 px each) | |
| | | Output | Softmax probabilities over 11 ISIC diagnostic classes | |
| | | Training | 5-fold stratified cross-validation, macro F1 optimisation | |
| | | Ensemble | 5 models (one per fold), predictions averaged at inference | |
| |
|
| | --- |
| |
|
| | ## Intended Use |
| |
|
| | This model is released strictly for **non-commercial research and educational purposes**, as part of the SkinAI application submitted to the MedGemma Impact Challenge. It is provided to support reproducibility of the challenge submission and to enable further research into multi-modal skin lesion classification. |
| |
|
| | **Intended users:** Researchers and developers working on dermatology AI, machine learning in medical imaging, or related computational fields. |
| |
|
| | --- |
| |
|
| | ## Out-of-Scope Uses |
| |
|
| | The following uses are explicitly out of scope and are **not supported**: |
| |
|
| | - **Clinical diagnosis or decision support** — the model has not been validated for clinical deployment and must not influence patient care in any setting. |
| | - **Patient triage or screening** — performance has only been evaluated on held-out folds of the MILK10k training distribution; generalisability to other populations, imaging devices, or clinical workflows is unknown. |
| | - **Autonomous or semi-autonomous medical decision making** — any application in which model outputs could directly or indirectly affect patient management. |
| | - **Deployment without independent clinical validation** — any production use would require prospective validation by qualified clinicians under appropriate regulatory oversight. |
| |
|
| | The performance metrics reported below reflect internal cross-validation on a single dataset and are **not sufficient evidence of clinical utility**. |
| |
|
| | --- |
| |
|
| | ## Diagnostic Classes |
| |
|
| | | Class | Description | |
| | |---|---| |
| | | AKIEC | Actinic keratosis / intraepithelial carcinoma | |
| | | BCC | Basal cell carcinoma | |
| | | BEN_OTH | Other benign lesion | |
| | | BKL | Benign keratosis | |
| | | DF | Dermatofibroma | |
| | | INF | Inflammatory / infectious | |
| | | MAL_OTH | Other malignant lesion | |
| | | MEL | Melanoma | |
| | | NV | Melanocytic nevus | |
| | | SCCKA | Squamous cell carcinoma / keratoacanthoma | |
| | | VASC | Vascular lesion | |
| |
|
| | --- |
| |
|
| | ## Performance |
| |
|
| | > **Important caveat:** All metrics below are from held-out validation folds of the MILK10k training dataset using 5-fold stratified cross-validation. They represent performance under distribution-matched conditions and should not be interpreted as estimates of real-world clinical performance. External validation has not been performed. |
| |
|
| | ### Aggregate Metrics |
| |
|
| | | Metric | Value | |
| | |---|---| |
| | | Balanced Multiclass Accuracy | 0.665 | |
| | | Macro F1 (ConvNeXt alone) | 0.555 | |
| | | Macro F1 (MedSigLIP + ConvNeXt ensemble) | 0.591 | |
| | | ISIC 2025 Leaderboard Score (Dice) | 0.538 | |
| |
|
| | ### Per-Class Metrics (Validation, Single ConvNeXt Fold) |
| |
|
| | | Class | AUC | AUC (Sens>80%) | Avg Precision | Sensitivity | Specificity | Dice | PPV | NPV | |
| | |---|---|---|---|---|---|---|---|---| |
| | | AKIEC | 0.933 | 0.873 | 0.704 | 0.732 | 0.924 | 0.675 | 0.627 | 0.952 | |
| | | BCC | 0.975 | 0.960 | 0.838 | 0.951 | 0.919 | 0.758 | 0.630 | 0.992 | |
| | | BEN_OTH | 0.978 | 0.953 | 0.505 | 0.429 | 0.998 | 0.545 | 0.750 | 0.992 | |
| | | BKL | 0.881 | 0.713 | 0.746 | 0.750 | 0.865 | 0.664 | 0.595 | 0.929 | |
| | | DF | 0.986 | 0.983 | 0.536 | 0.833 | 0.992 | 0.667 | 0.556 | 0.998 | |
| | | INF | 0.841 | 0.722 | 0.164 | 0.364 | 0.985 | 0.364 | 0.364 | 0.985 | |
| | | MAL_OTH | 0.820 | 0.717 | 0.518 | 0.400 | 0.993 | 0.571 | 1.000 | 0.987 | |
| | | MEL | 0.957 | 0.935 | 0.820 | 0.821 | 0.950 | 0.688 | 0.593 | 0.984 | |
| | | NV | 0.960 | 0.948 | 0.845 | 0.865 | 0.963 | 0.796 | 0.738 | 0.983 | |
| | | SCCKA | 0.949 | 0.911 | 0.857 | 0.863 | 0.903 | 0.798 | 0.743 | 0.953 | |
| | | VASC | 0.993 | 0.991 | 0.614 | 0.800 | 0.994 | 0.667 | 0.571 | 0.998 | |
| | | **Mean** | **0.934** | **0.883** | **0.650** | **0.710** | **0.954** | **0.654** | **0.651** | **0.978** | |
| |
|
| | > Rare classes (INF: ~11 lesions, MAL_OTH: ~15 lesions, VASC: ~15 lesions) are severely underrepresented in MILK10k. Sensitivity figures for these classes should be interpreted with caution given the small sample sizes involved. |
| | |
| | --- |
| | |
| | ## Usage |
| | |
| | This code is provided for research reproducibility. Users are responsible for ensuring any application complies with applicable laws and ethical guidelines. |
| | |
| | ```python |
| | import torch |
| | import torch.nn.functional as F |
| | import timm |
| | import torch.nn as nn |
| | from PIL import Image |
| | import torchvision.transforms as transforms |
| | from huggingface_hub import hf_hub_download |
| |
|
| |
|
| | # --- Model Definition --- |
| |
|
| | class DualConvNeXt(nn.Module): |
| | def __init__(self, num_classes=11, model_name='convnext_base'): |
| | super().__init__() |
| | self.clinical_encoder = timm.create_model( |
| | model_name, pretrained=False, num_classes=0 |
| | ) |
| | self.derm_encoder = timm.create_model( |
| | model_name, pretrained=False, num_classes=0 |
| | ) |
| | feat_dim = self.clinical_encoder.num_features |
| | self.classifier = nn.Sequential( |
| | nn.Linear(feat_dim * 2, 512), |
| | nn.ReLU(), |
| | nn.Dropout(0.3), |
| | nn.Linear(512, num_classes) |
| | ) |
| | |
| | def forward(self, clinical, derm): |
| | c = self.clinical_encoder(clinical) |
| | d = self.derm_encoder(derm) |
| | return self.classifier(torch.cat([c, d], dim=1)) |
| | |
| |
|
| | # --- Load Model --- |
| |
|
| | CLASS_NAMES = ['AKIEC', 'BCC', 'BEN_OTH', 'BKL', 'DF', |
| | 'INF', 'MAL_OTH', 'MEL', 'NV', 'SCCKA', 'VASC'] |
| | |
| | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
| | |
| | model = DualConvNeXt(num_classes=11) |
| |
|
| | weights_path = hf_hub_download( |
| | repo_id="tech-doc/ConvNeXt_Milk10k", |
| | filename="convnext_fold0_best.pth" |
| | ) |
| | checkpoint = torch.load(weights_path, map_location=device) |
| | model.load_state_dict(checkpoint['model_state_dict']) |
| | model.eval().to(device) |
| | |
| | |
| | # --- Preprocessing --- |
| | |
| | transform = transforms.Compose([ |
| | transforms.Resize((384, 384)), |
| | transforms.ToTensor(), |
| | transforms.Normalize( |
| | mean=[0.485, 0.456, 0.406], |
| | std=[0.229, 0.224, 0.225] |
| | ) |
| | ]) |
| | |
| | |
| | # --- Inference --- |
| | |
| | def predict(clinical_image_path: str, derm_image_path: str) -> dict: |
| | """ |
| | Research inference only. Output must not be used for clinical decisions. |
| | |
| | Args: |
| | clinical_image_path: Path to clinical close-up photograph |
| | derm_image_path: Path to dermoscopic image |
| | |
| | Returns: |
| | dict with 'prediction', 'confidence', and 'probabilities' |
| | """ |
| | clinical = transform(Image.open(clinical_image_path).convert('RGB')).unsqueeze(0).to(device) |
| | derm = transform(Image.open(derm_image_path).convert('RGB')).unsqueeze(0).to(device) |
| | |
| | with torch.no_grad(): |
| | logits = model(clinical, derm) |
| | probs = F.softmax(logits, dim=1).squeeze().cpu().numpy() |
| | |
| | return { |
| | 'prediction': CLASS_NAMES[probs.argmax()], |
| | 'confidence': float(probs.max()), |
| | 'probabilities': {c: float(p) for c, p in zip(CLASS_NAMES, probs)} |
| | } |
| | |
| |
|
| | # Example |
| | result = predict('clinical.jpg', 'dermoscopy.jpg') |
| | print(f"Prediction: {result['prediction']} ({result['confidence']:.1%})") |
| | ``` |
| | |
| | --- |
| | |
| | ## Training Details |
| | |
| | | Parameter | Value | |
| | |---|---| |
| | | Base model | `convnext_base` (ImageNet-22k pretrained via `timm`) | |
| | | Image size | 384×384 px | |
| | | Batch size | 32 | |
| | | Optimiser | AdamW, lr=1e-4 | |
| | | Scheduler | Cosine annealing with warm restarts | |
| | | Loss | Cross-entropy with class weights + focal loss | |
| | | Augmentation | Random flips, rotations, colour jitter, RandAugment | |
| | | Folds | 5-fold stratified CV (seed 42) | |
| | | Hardware | NVIDIA A100 (Google Colab) | |
| | | Training time | ~4–6 hours per fold | |
| | |
| | --- |
| | |
| | ## Limitations |
| | |
| | - **Single-dataset evaluation:** Trained and evaluated exclusively on MILK10k (~5,240 lesions). No external validation has been performed. Reported metrics should not be generalised beyond this distribution. |
| | - **Severe class imbalance:** Rare classes (INF: ~11 lesions, MAL_OTH: ~15 lesions, VASC: ~15 lesions) are underrepresented. Performance on these classes is highly uncertain and may not be reproducible on different samples. |
| | - **Paired-image requirement:** The model requires simultaneous dermoscopic and clinical photographs of the same lesion. Single-image inference is architecturally unsupported and was not evaluated. |
| | - **Skin tone representation:** The MILK10k dataset composition with respect to Fitzpatrick phototype has not been fully characterised. Performance across darker skin tones (Fitzpatrick IV–VI) has not been validated. |
| | - **Paediatric populations:** The model was not evaluated on paediatric patients. |
| | - **Device variability:** Performance may degrade with imaging devices, magnifications, or lighting conditions not represented in the training data. |
| | - **No prospective validation:** All reported metrics are from retrospective cross-validation. Prospective clinical validation would be required before any consideration of real-world use. |
| | |
| | --- |
| | |
| | ## Citation |
| | |
| | If you use this model or the MILK10k dataset in your research, please cite: |
| | |
| | ```bibtex |
| | @dataset{milk10k2025, |
| | author = {MILK study team}, |
| | title = {MILK10k}, |
| | year = {2025}, |
| | publisher = {ISIC Archive}, |
| | doi = {10.34970/648456} |
| | } |
| | ``` |
| | |
| | --- |
| | |
| | ## License |
| | |
| | **CC BY-NC 4.0** — This model was trained on MILK10k data (CC-BY-NC licensed). Non-commercial research use only. Any commercial application is prohibited without explicit permission. |
| | |