skin_cancer_vit_classifier

Overview

This model is a Vision Transformer (ViT) fine-tuned on high-resolution dermatoscopic imagery. It is designed to assist in the preliminary classification of skin lesions into seven distinct diagnostic categories. It leverages self-attention mechanisms to identify subtle morphological patterns in skin tissue that are characteristic of specific malignancies.

Model Architecture

The model is based on the ViT-Base architecture (vit-base-patch16-224).

  • Patches: The input image (224x224) is divided into 16x16 patches.
  • Encoder: 12 layers of Transformer blocks with multi-head self-attention.
  • Classification Head: A linear layer on the [CLS] token used to predict one of seven lesion types.
  • Training: Fine-tuned using the HAM10000 dataset with heavy augmentation (rotation, flipping, and color jittering) to account for varying skin tones and lighting conditions.

Intended Use

  • Clinical Decision Support: Providing a secondary "opinion" for dermatologists during screening.
  • Medical Research: Analyzing feature importance maps to understand which visual cues correlate with specific lesion types.
  • Educational Tools: Assisting medical students in identifying visual markers of BCC or Melanoma.

Limitations

  • Not for Primary Diagnosis: This model is an experimental tool and should not replace professional medical diagnosis.
  • Demographic Bias: Performance may vary significantly on skin types underrepresented in the training data (Fitzpatrick scales V and VI).
  • Image Quality: Requires clear, focused dermatoscopic images. Standard smartphone photos without specialized lenses may lead to inaccurate results.
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support