skin_cancer_vit_classifier
Overview
This model is a Vision Transformer (ViT) fine-tuned on high-resolution dermatoscopic imagery. It is designed to assist in the preliminary classification of skin lesions into seven distinct diagnostic categories. It leverages self-attention mechanisms to identify subtle morphological patterns in skin tissue that are characteristic of specific malignancies.
Model Architecture
The model is based on the ViT-Base architecture (vit-base-patch16-224).
- Patches: The input image (224x224) is divided into 16x16 patches.
- Encoder: 12 layers of Transformer blocks with multi-head self-attention.
- Classification Head: A linear layer on the [CLS] token used to predict one of seven lesion types.
- Training: Fine-tuned using the HAM10000 dataset with heavy augmentation (rotation, flipping, and color jittering) to account for varying skin tones and lighting conditions.
Intended Use
- Clinical Decision Support: Providing a secondary "opinion" for dermatologists during screening.
- Medical Research: Analyzing feature importance maps to understand which visual cues correlate with specific lesion types.
- Educational Tools: Assisting medical students in identifying visual markers of BCC or Melanoma.
Limitations
- Not for Primary Diagnosis: This model is an experimental tool and should not replace professional medical diagnosis.
- Demographic Bias: Performance may vary significantly on skin types underrepresented in the training data (Fitzpatrick scales V and VI).
- Image Quality: Requires clear, focused dermatoscopic images. Standard smartphone photos without specialized lenses may lead to inaccurate results.
- Downloads last month
- 1