tech-doc
/

ConvNeXt_Milk10k

+---
+license: cc-by-nc-4.0
+tags:
+  - medical-imaging
+  - dermatology
+  - skin-lesion-classification
+  - convnext
+  - isic
+  - multi-modal
+base_model: facebook/convnext-base-224-22k-1k
+metrics:
+  - balanced-accuracy
+  - macro-f1
+  - auc-roc
+language:
+  - en
+pipeline_tag: image-classification
+---
+# ConvNeXt Dual-Modal Skin Lesion Classifier (ISIC 2025 / MILK10k)
+This model classifies skin lesions into 11 diagnostic categories using paired dermoscopic and clinical photographs. It forms part of the **Skin AI** application, where it is called as a tool by MedGemma to provide structured skin lesion classification.
+## Model Description
+A dual-input ConvNeXt-Base architecture trained end-to-end on the MILK10k dataset (ISIC 2025 Challenge). The model processes both a dermoscopic image and a clinical close-up photograph of the same lesion simultaneously, fusing representations before classification.
+- **Architecture:** Dual ConvNeXt-Base with shared weights, late fusion
+- **Input:** Paired dermoscopic + clinical images (384×384px)
+- **Output:** Softmax probabilities over 11 ISIC diagnostic classes
+- **Training:** 5-fold cross-validation, macro F1 optimisation
+- **Ensemble:** 5 models (one per fold), predictions averaged
+## Intended Use
+This model is intended for **research use only** as a component of the Skin AI application submitted to the MedGemma Impact Challenge. It is not validated for clinical use and must not be used to guide diagnosis or patient management.
+**Intended users:** Researchers and developers building medical AI applications.
+**Out of scope:** Direct clinical decision support, patient triage, or any deployment without further validation by qualified clinicians.
+## Diagnostic Classes
+| Class | Description |
+|-------|-------------|
+| AKIEC | Actinic keratosis / intraepithelial carcinoma |
+| BCC | Basal cell carcinoma |
+| BEN_OTH | Other benign lesion |
+| BKL | Benign keratosis |
+| DF | Dermatofibroma |
+| INF | Inflammatory / infectious |
+| MAL_OTH | Other malignant lesion |
+| MEL | Melanoma |
+| NV | Melanocytic nevus |
+| SCCKA | Squamous cell carcinoma / keratoacanthoma |
+| VASC | Vascular lesion |
+## Performance
+Evaluated on held-out validation folds from MILK10k training data (5-fold cross-validation, stratified by lesion diagnosis).
+### Aggregate Metrics
+| Metric | Value |
+|--------|-------|
+| **Balanced Multiclass Accuracy** | **0.665** |
+| Macro F1 (ConvNeXt alone) | 0.555 |
+| Macro F1 (MedSigLIP + ConvNeXt ensemble) | 0.591 |
+| ISIC 2025 Leaderboard (Dice) | 0.538 |
+### Per-Class Metrics (Validation, Single ConvNeXt)
+| Class | AUC | AUC (Sens>80%) | Avg Precision | Sensitivity | Specificity | Dice | PPV | NPV |
+|-------|-----|----------------|---------------|-------------|-------------|------|-----|-----|
+| AKIEC | 0.933 | 0.873 | 0.704 | 0.732 | 0.924 | 0.675 | 0.627 | 0.952 |
+| BCC | 0.975 | 0.960 | 0.838 | 0.951 | 0.919 | 0.758 | 0.630 | 0.992 |
+| BEN_OTH | 0.978 | 0.953 | 0.505 | 0.429 | 0.998 | 0.545 | 0.750 | 0.992 |
+| BKL | 0.881 | 0.713 | 0.746 | 0.750 | 0.865 | 0.664 | 0.595 | 0.929 |
+| DF | 0.986 | 0.983 | 0.536 | 0.833 | 0.992 | 0.667 | 0.556 | 0.998 |
+| INF | 0.841 | 0.722 | 0.164 | 0.364 | 0.985 | 0.364 | 0.364 | 0.985 |
+| MAL_OTH | 0.820 | 0.717 | 0.518 | 0.400 | 0.993 | 0.571 | 1.000 | 0.987 |
+| MEL | 0.957 | 0.935 | 0.820 | 0.821 | 0.950 | 0.688 | 0.593 | 0.984 |
+| NV | 0.960 | 0.948 | 0.845 | 0.865 | 0.963 | 0.796 | 0.738 | 0.983 |
+| SCCKA | 0.949 | 0.911 | 0.857 | 0.863 | 0.903 | 0.798 | 0.743 | 0.953 |
+| VASC | 0.993 | 0.991 | 0.614 | 0.800 | 0.994 | 0.667 | 0.571 | 0.998 |
+| **Mean** | **0.934** | **0.883** | **0.650** | **0.710** | **0.954** | **0.654** | **0.651** | **0.978** |
+> **Note:** Rare classes (INF, MAL_OTH, BEN_OTH) show lower sensitivity due to class imbalance in the MILK10k dataset.
+## Usage
+```python
+import torch
+import torch.nn.functional as F
+import timm
+import torch.nn as nn
+from PIL import Image
+import torchvision.transforms as transforms
+from huggingface_hub import hf_hub_download
+# --- Model Definition ---
+class DualConvNeXt(nn.Module):
+    def __init__(self, num_classes=11, model_name='convnext_base'):
+        super().__init__()
+        self.clinical_encoder = timm.create_model(
+            model_name, pretrained=False, num_classes=0
+        )
+        self.derm_encoder = timm.create_model(
+            model_name, pretrained=False, num_classes=0
+        )
+        feat_dim = self.clinical_encoder.num_features
+        self.classifier = nn.Sequential(
+            nn.Linear(feat_dim * 2, 512),
+            nn.ReLU(),
+            nn.Dropout(0.3),
+            nn.Linear(512, num_classes)
+        )
+    def forward(self, clinical, derm):
+        c = self.clinical_encoder(clinical)
+        d = self.derm_encoder(derm)
+        return self.classifier(torch.cat([c, d], dim=1))
+# --- Load Model ---
+CLASS_NAMES = ['AKIEC', 'BCC', 'BEN_OTH', 'BKL', 'DF',
+               'INF', 'MAL_OTH', 'MEL', 'NV', 'SCCKA', 'VASC']
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+model = DualConvNeXt(num_classes=11)
+# Load weights (update repo_id to your HF repo)
+weights_path = hf_hub_download(
+    repo_id="tech-doc/ConvNeXt_Milk10k",
+    filename="convnext_fold0_best.pth"
+)
+checkpoint = torch.load(weights_path, map_location=device)
+model.load_state_dict(checkpoint['model_state_dict'])
+model.eval().to(device)
+# --- Preprocessing ---
+transform = transforms.Compose([
+    transforms.Resize((384, 384)),
+    transforms.ToTensor(),
+    transforms.Normalize(
+        mean=[0.485, 0.456, 0.406],
+        std=[0.229, 0.224, 0.225]
+    )
+])
+# --- Inference ---
+def predict(clinical_image_path: str, derm_image_path: str) -> dict:
+    """
+    Classify a skin lesion from paired images.
+    Args:
+        clinical_image_path: Path to clinical close-up photograph
+        derm_image_path: Path to dermoscopic image
+    Returns:
+        dict with 'prediction' (class name) and 'probabilities' (dict)
+    """
+    clinical = transform(Image.open(clinical_image_path).convert('RGB')).unsqueeze(0).to(device)
+    derm = transform(Image.open(derm_image_path).convert('RGB')).unsqueeze(0).to(device)
+    with torch.no_grad():
+        logits = model(clinical, derm)
+        probs = F.softmax(logits, dim=1).squeeze().cpu().numpy()
+    return {
+        'prediction': CLASS_NAMES[probs.argmax()],
+        'confidence': float(probs.max()),
+        'probabilities': {c: float(p) for c, p in zip(CLASS_NAMES, probs)}
+    }
+# Example
+result = predict('clinical.jpg', 'dermoscopy.jpg')
+print(f"Prediction: {result['prediction']} ({result['confidence']:.1%})")
+```
+## Training Details
+- **Base model:** `convnext_base` (ImageNet-22k pretrained via timm)
+- **Image size:** 384×384
+- **Batch size:** 32
+- **Optimiser:** AdamW, lr=1e-4
+- **Scheduler:** Cosine annealing with warm restarts
+- **Loss:** Cross-entropy with class weights + focal loss
+- **Augmentation:** Random flips, rotations, colour jitter, RandAugment
+- **Folds:** 5-fold stratified CV (seed 42)
+- **GPU:** NVIDIA A100 (Google Colab)
+- **Training time:** ~4–6 hours per fold
+## Limitations
+- Trained exclusively on MILK10k (5,240 lesions). Performance on external datasets has not been validated.
+- Rare classes (INF: 11 lesions, MAL_OTH: 15 lesions, VASC: 15 lesions) are underrepresented — sensitivity for these classes is lower.
+- Model requires paired clinical + dermoscopic images; single-image inference is not supported.
+- Not evaluated on paediatric patients or non-Fitzpatrick I–III skin tones at scale.
+## Citation
+If you use this model, please cite the MILK10k dataset:
+```bibtex
+@dataset{milk10k2025,
+  author    = {MILK study team},
+  title     = {MILK10k},
+  year      = {2025},
+  publisher = {ISIC Archive},
+  doi       = {10.34970/648456}
+}
+```
+## License
+**CC BY-NC 4.0** — This model was trained on MILK10k data (CC-BY-NC licensed). Non-commercial research use only.